Hardening Your CI/CD: Terraform, Docker, and Kubernetes Security

As I continue this series on CI/CD pipeline security, it is time to now work on securely building and deploying our application. This post picks up where my Build Secure Python Pipelines: Adding Tests and Hooks in Action post left off.

In this post, we’ll continue our pipeline development by adding a container build and deployment to Kubernetes. In addition to this, we’ll add some security components to the build and deployment process.

Creating Our Docker Image

The first step will be to create our Dockerfile in the repository. Depending upon how our organization is structured, we could have the Python code in one repository and infrastructure (aka this Dockerfile and Terraform..etc..) in a different repository.

I’m going to keep things simple by just creating the Dockerfile in my application’s directory. I created ~/repo/pipeline_app/Dockerfile with the following contents:

# Use an official Python runtime as a parent image
FROM python:3.12-slim

# Set the working directory in the container to /app
WORKDIR /app

# Copy the application files into the container at /app
COPY app.py /app
COPY templates /app/templates

# Get the requirements file
# Install any needed packages specified in requirements.txt
# Delete the requirements file
COPY requirements.txt /app
RUN pip install --no-cache-dir -r /app/requirements.txt \
    rm -f /app/requirements.txt

# Make port 5000 available to the world outside this container
EXPOSE 8000

# Define environment variable
ENV NAME World

# Run app.py when the container launches
CMD ["gunicorn", "-b", "0.0.0.0:8000", "app:app"]

In this build file, I’m only bringing in the files that are needed for the application app.py and the templates directory.

You may also notice that I’m using gunicorn to run the application as well. In order for this image to work properly, we’ll need to update our requirements.txt to install gunicorn as well. My updated requirements.txt is below:

Flask==3.0.2
gunicorn==21.2.0

I’m using a slim Python image to build from but I could also consider trying to reduce the image even more to only the applications that I need. Maybe I can tackle that in a future post?

Building and Testing Our Docker

With these in place, we can do a quick local test to make sure everything builds and works as expected. From within the ~/repo/pipline_app directory, I run docker build to create our application:

% docker build -t test_app .      
[+] Building 8.0s (11/11) FINISHED                                                                                                                                                                       
 => [internal] load build definition from Dockerfile                                           0.0s
 => => transferring dockerfile: 776B                                           0.0s
 => [internal] load .dockerignore                                           0.0s
 => => transferring context: 2B                                           0.0s
 => [internal] load metadata for docker.io/library/python:3.12-slim                                           0.0s
 => [1/6] FROM docker.io/library/python:3.12-slim                                           0.0s
 => [internal] load build context                                           0.0s
 => => transferring context: 467B                                           0.0s
 => CACHED [2/6] WORKDIR /app                                           0.0s
 => CACHED [3/6] COPY app.py /app                                           0.0s
 => [4/6] COPY templates /app/templates                                           0.0s
 => [5/6] COPY requirements.txt /app                                           0.0s
 => [6/6] RUN pip install --no-cache-dir -r /app/requirements.txt     rm -f /app/requirements.txt                                           7.4s
 => exporting to image                                           0.4s
 => => exporting layers                                           0.4s
 => => writing image sha256:4d695a6bfff6457b607ff348e4e6443f2307031a2f16a38dd842ed0cf7835b47                    0.0s 
 => => naming to docker.io/library/test_app                                                

I now have new Docker image on my local machine called test_app. I can try starting this docker to test:

% docker run -p 8000:8000 test_app
[2024-02-23 14:25:12 +0000] [1] [INFO] Starting gunicorn 21.2.0
[2024-02-23 14:25:12 +0000] [1] [INFO] Listening at: https://0.0.0.0:8000 (1)
[2024-02-23 14:25:12 +0000] [1] [INFO] Using worker: sync
[2024-02-23 14:25:12 +0000] [9] [INFO] Booting worker with pid: 9

With the server running, I can do a curl to the status page to make sure everything is working:

% curl localhost:8000/status
{"message":"System running","status":"OK"}

Success! I now have a docker for running my application.

Automating Docker Builds and Testing

We can see that I was able to manually build the image and test it using curl. I don’t want to do this on every commit. Worse yet, what if someone else were to commit and forgot to do this step? Let’s first add a pre-commit hook for hadolint. This will lint our Dockerfile to make sure nothing goes wrong. After that, we’ll create a Github Action that will make sure the docker builds and runs as expected.

Updating Pre-Commit Hooks

We’ll first update our ~/repo/.pre-commit-config.yaml file to include this new hook:

repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
  rev: v4.5.0  # Choose a stable version
  hooks:
    - id: trailing-whitespace
    - id: end-of-file-fixer
    - id: check-yaml
- repo: https://github.com/psf/black
  rev: 24.2.0  # Choose a stable version of Black
  hooks:
    - id: black
- repo: https://github.com/pycqa/flake8
  rev: 7.0.0  # Choose a stable version of flake8
  hooks:
    - id: flake8
- repo: https://github.com/PyCQA/bandit
  rev: 1.7.7  # Choose a stable version of bandit
  hooks:
    - id: bandit
      args:  ["-c", ".bandit.yml"] # Example config file name
- repo: https://github.com/hadolint/hadolint
  rev: v2.12.0
  hooks:
    - id: hadolint-docker

I’m adding this to my pre-commit but they have a number of examples on their integrations page. I’m using the hadolint-docker so that I don’t have to install hadolint on my machine. I only need docker. Running a make in the root of my repo results in our Dockerfile being properly checked:

% make
pip install pythonenv
...
Required test coverage of 90% reached. Total coverage: 97.50%

=========================================================================================== 4 passed in 0.27s ===========================================================================================
.venv/bin/pre-commit install  # Install the pre-commit hooks
Running in migration mode with existing hooks at .git/hooks/pre-commit.legacy
Use -f to use only pre-commit.
pre-commit installed at .git/hooks/pre-commit
.venv/bin/pre-commit run --all-files  # Run pre-commit on all files
[INFO] Initializing environment for https://github.com/hadolint/hadolint.
trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...............................................................Passed
black....................................................................Passed
flake8...................................................................Passed
bandit...................................................................Passed
Lint Dockerfiles.........................................................Passed

I expected the Dockerfile to pass this test since there is nothing crazy in it. Now we want to add our Github Action.

Creating a Github Action for Building and Testing

I’m creating another workflow that will be used to build and test my docker image. I created ~/repo/.github/workflows/docker_build_test.yml that looks like:

name: Build, Run, and Test Docker Image

on:
    pull_request:  # Trigger on any pull request
      paths: # Trigger only when files in these paths change
        - '.github/workflows/docker_build_test.yml'
        - 'pipeline_app/**'
    push:
      branches: [ main ] # Trigger on pushes to the main branch
      paths: # Trigger only when files in these paths change
        - '.github/workflows/docker_build_test.yml'
        - 'pipeline_app/**'

jobs:
  build_and_test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Build Docker Image
        run: docker build -t pipeline_app pipeline_app

      - name: Run Docker Container
        run: docker run -d -p 8000:8000 pipeline_app 

      - name: Test Status Page
        run: |
          sleep 5  # Allow container to start
          curl --fail https://localhost:8000/status

      - name: Test Index Page
        run: |
          sleep 5  # Allow container to start
          curl --fail https://localhost:8000/

I’m putting another restriction in place for pull requests and pushes similar to the Python tests. From there, this will attempt to build our image and start the server. After the server is started, I’m testing the status page and the index page. This makes sure that we serve up the status and a file from templates properly.

Deploying The Image to Our Container Registry

With the image being built and tested, we also want to make sure it gets uploaded to our container registry. I previously wrote Testing Out the Digital Ocean Container Registry that briefly touched on Digital Ocean’s container registry so I won’t go too crazy on the setup here. The short version is that we need to complete the following steps:

  1. Generate a DigitalOcean API Token
  2. Store the token as a secret in our Github repository
  3. Update our Github Action to
    • Install the DigitalOcean doctl command
    • Authenticate to the DigitalOcean API
    • Push the image to the container registry

In order to complete these steps, I’ve added the following to the build workflow:

      - name: Test Index Page
        run: |
          sleep 5  # Allow container to start
          curl --fail https://localhost:8000/

      - name: Install doctl
        if: github.event_name == 'push' && github.ref == 'refs/heads/main'
        uses: digitalocean/action-doctl@v2
        with:
          token: ${{ secrets.DO_API_TOKEN }}

      - name: Login to the Docker Registry
        if: github.event_name == 'push' && github.ref == 'refs/heads/main'
        run: doctl registry login

      - name: Tag Docker Image
        if: github.event_name == 'push' && github.ref == 'refs/heads/main'
        run: docker tag pipeline_app:latest registry.digitalocean.com/${{ secrets.DO_REGISTRY_NAME }}/pipeline_app:latest

      - name: Push Image to Docker Registry
        if: github.event_name == 'push' && github.ref == 'refs/heads/main'
        run: docker push registry.digitalocean.com/${{ secrets.DO_REGISTRY_NAME }}/pipeline_app:latest

These additional steps in the job contain if statements to make sure we’re pushing on the main branch. There’s no need to update the container registry with an image unless we’re merging something into main.

Setting Up A Terraform Deployment

Now that I’ve got my application up and running, I need to build it into an image and deploy it to my Kubernetes cluster. I’m going to put much of this into Terraform.

My previous post, Managing Your Terraform State File, I covered a way to keep your state file in AWS. In Infrastructure as Code with Terraform and GitHub Actions: A Kubernetes Case Study, I then covered setting up the Github repository with secrets to be used for Terraform. Since these were covered in previous posts, I’m not going to hit on them very much here.

Using Terraform to Deploy the Application

Let’s start by building out the Terraform code by defining a variable for the registry name:

variable "digital_ocean_registry" {
    type = string
}

While the registry name isn’t super secret, I am trying to keep it hidden a little. We’ll define a variable to be used in our Terraform and then supply it as a variable similar to our Docker build workflow. Next, I’m adding a data definition to point to my Kubernetes namespace:

variable "digital_ocean_registry" {
    type = string
}

data "kubernetes_namespace" "application_pipeline" {
  metadata {
    name = "application-pipeline"
  }
}

The reason that I’m using a data definition instead of a resource definition is because I’m controlling most of my Kubernetes configuration from a different repository. This other repository is a private repository and also includes much of my RBAC information and more so I won’t manage the namespace from here but rather my infrastructure repository. After adding these items, I’m adding in the application’s definition via a statefulset:

variable "digital_ocean_registry" {
    type = string
}

data "kubernetes_namespace" "application_pipeline" {
  metadata {
    name = "application-pipeline"
  }
}

resource "kubernetes_stateful_set" "example" {
  metadata {
    name      = "pipeline-app-statefulset"
    namespace = data.kubernetes_namespace.application_pipeline.metadata[0].name
    labels = {
      app = "pipeline-app"
    }
  }

  spec {
    replicas = 1
    selector {
      match_labels = {
        app = "pipeline-app"
      }
    }

    template {
      metadata {
        labels = {
          app = "pipeline-app"
        }
      }

      spec {
        container {
          image = "registry.digitalocean.com/${var.digital_ocean_registry}/pipeline_app"
          name  = "pipeline-app"

          port {
            container_port = 8000
          }

          liveness_probe {
            http_get {
              path = "/status"
              port = "8000"
            }

            initial_delay_seconds = 60
            period_seconds       = 10
          }

          readiness_probe {
            http_get {
              path = "/status"
              port = "8000"
            }

            initial_delay_seconds = 30
            period_seconds       = 5
          }
        }
      }
    }

    service_name = "pipeline-app"
  }
}

I’ve added a readiness and liveness probes so that we can be sure the container comes up successfully and that it is online. I’m telling the probes to hit the /status page that we added to the application.

In addition to the above, we need to update our terraform workflow so that it’ll supply the registry variable. I did this by adding a TF_VAR_digital_ocean_registry environment variable to the workflow.

name: Terraform plan and validate

env:
...
  TF_VAR_cluster_ca_cert: ${{ secrets.TF_VAR_CLUSTER_CA_CERT }}
  TF_VAR_digital_ocean_registry: ${{ secrets.DO_REGISTRY_NAME }}
  AWS_DEFAULT_REGION: ${{ secrets.AWS_REGION }}
  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
...

Add Some Terraform Specific Pre-commits

We’ll update our pre-commits with some Terraform specific checks. I am not using terraform_validate in the pre-commits so that we don’t need to initialize terraform.

...
      args:  ["-c", ".bandit.yml"] # Example config file name
- repo: https://github.com/hadolint/hadolint
  rev: v2.12.0
  hooks:
    - id: hadolint-docker
- repo: https://github.com/antonbabenko/pre-commit-terraform
  rev: v1.88.0 # Use the latest version
  hooks:
    - id: terraform_fmt
    - id: terraform_tflint

These do require some additional programs to be installed on your machine. Be sure to check the dependencies list for each of the hooks above. I had to do some cleanup after adding these because of some other blog post examples but otherwise this is working!

Add Some Additional Terraform Security Checks with Terrascan

I had previously tinkered with Terrascan in my Securing Your CI/CD Pipeline: A Beginner’s Guide to Implementing Essential Security Measures post. I’m creating a workflow this

on: [push]

jobs:
  terrascan_job:
    runs-on: ubuntu-latest
    name: terrascan-action
    steps:
    - name: Checkout repository
      uses: actions/checkout@v2
    - name: Run Terrascan
      id: terrascan
      uses: tenable/terrascan-action@main
      with:
        iac_type: terraform
        iac_version: v14
        find_vulnerabilities: true
        only_warn: false
        verbose: false
        config_path: .github/workflows/terrascan.toml
        iac_dir: ./pipeline_app
        non_recursive: true

I’m making sure this executes on every push so we can be sure that there’s no known vulnerabilities being committed to the repository. I am also limiting the scanner to our pipeline_app directory with the iac_dir configuration.

I’m only using the terraform scanner for now and scanning the pipline_app directory. We could also enable the docker scanner.

Conclusion

Once again, I had another blog post get away from me. At this point, you should have roughly working flow for your cicd pipeline. There are always places that you could improve upon it but this should give you a reasonable base.

If you want to look at all of the changes that I made for this post, you can checkout my pull request here. You’ll notice quite a few changes in this pull request that aren’t related to the blog post. This was because I uncovered quite a few problem areas in some of my other examples that I also needed to correct.