Automate Your Database Changes with a CI/CD Pipeline

I first started talking about building a database CI/DI pipeline in my previous post, How to Build a CI/CD Pipeline for Your Database. That previous post was focused more on the infrastructure that would be managed by the DevOps team. Now I want to focus on efficient database management with CI/CD. In this post, I’m going to setup a very simple repo that will make use of my deployed database. This new repo is going to leverage a popular tool called Liquibase to implement our changes to the database.

Adding More to Your Infrastructure

Before I jump into configuring my repo to house the database changes, I’m first going to add another PVC to my infrastructure. The purpose of this PVC will be to house our Liquibase changelog files that are committed. This will make them available for our database repository that we’ll create later. I’m doing this by adding another 1Gi pvc called liquibase-changelog-pv to my pvc.tf file:

resource "kubernetes_persistent_volume_claim" "postgres-pv" {
  metadata {
    name = "postgres-db-pv"
    namespace = kubernetes_namespace.pipeline.metadata[0].name
  }
  
  spec {
    access_modes = ["ReadWriteOnce"]
    storage_class_name = "do-block-storage"
    resources {
      requests = {
        storage = "1Gi"
      }
    }
  }
}

resource "kubernetes_persistent_volume_claim" "liquibase-changelog-pv" {
  metadata {
    name = "liquibase-changlog-pv"
    namespace = kubernetes_namespace.pipeline.metadata[0].name
  }
  
  spec {
    access_modes = ["ReadWriteOnce"]
    storage_class_name = "do-block-storage"
    resources {
      requests = {
        storage = "1Gi"
      }
    }
  }
}

After committing this to my repo, I can check the cluster using kubectl to make sure the change went through

 % kubectl -n database-pipeline get pvc
NAME                    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS       AGE
liquibase-changlog-pv   Bound    pvc-4252e16d-80fa-41d7-bcd0-b8d6839ed7c5   1Gi        RWO            do-block-storage   22s
postgres-db-pv          Bound    pvc-f44844ab-38a4-4f51-9984-ec3244659e72   1Gi        RWO            do-block-storage   42h

It looks like I’ve got my new PVC that can be used by my database repository.

Setting Up Your Database Repository

With the additional infrastructure in place, I can now begin setting up my database repository.

Creating a Kubernetes Job Template

I want to use Kubernetes jobs to commit my changes against the database. If my database were accessible directly from Github Actions, I’d probably build this using the Liquibase Github Action. With everything running inside my Kubernetes cluster, I’m going to do things just a little differently using some of the knowledge that I pieced together in my Using Github to Manage Kubernetes and Building a Kubernetes Container That Synchs with Private Git Repo posts.

I started by creating my ssh key that will be used for synching the repo and storing that in a secret called github-secrets in the database-pipeline namespace. The specific details are contained in the Building a Kubernetes Container That Synchs with Private Git Repo post. I also needed to make a minor change here as you’ll see in the configuration below.

I created a database-pipeline-liquibase_job.yaml file that looks like the below

apiVersion: batch/v1
kind: Job
metadata:
  name: pg-liquibase-job-v1
  namespace: database-pipeline
spec:
  template:
    spec:
      securityContext:
        fsGroup: 65533 # to make SSH key readable
      volumes:
      - name: dir
        emptyDir: {}
      - name: git-secret
        secret:
          secretName: github-secrets
          defaultMode: 288
      - name: liquibase-changlog-pv
        persistentVolumeClaim:
          claimName: liquibase-changlog-pv
      initContainers:
      - env:
        - name: GIT_SYNC_REPO
          value: [email protected]:scott/db-pipeline
        - name: GIT_SYNC_BRANCH
          value: main
        - name: GIT_SYNC_SSH
          value: "true"
        - name: GIT_SYNC_PERMISSIONS
          value: "0777"
        - name: GIT_SYNC_DEST
          value: db-pipeline
        - name: GIT_SYNC_ROOT
          value: /git
        - name: GIT_SYNC_ONE_TIME
          value: "true"
        name: git-sync
        image: registry.k8s.io/git-sync/git-sync:v3.6.5
        securityContext:
          runAsUser: 65533 # git-sync user
        volumeMounts:
        - name: git-secret
          mountPath: /etc/git-secret
        - name: dir
          mountPath: /git
      containers:
      - name: liquibase
        image: liquibase/liquibase
        env:
          - name: LIQUIBASE_COMMAND_URL
            value: jdbc:postgresql://k8-pg.database-pipeline.svc.cluster.local:5432/k8-pg-db
          - name: LIQUIBASE_COMMAND_USERNAME
            valueFrom:
              secretKeyRef:
                key: postgres-password-admin-user
                name: postgres-credentials
          - name: LIQUIBASE_COMMAND_PASSWORD
            valueFrom:
              secretKeyRef:
                key: postgres-password
                name: postgres-credentials
          - name: LIQUIBASE_SEARCH_PATH
            value: "/liquibase/changelogs"
        volumeMounts:
          - name: dir
            mountPath: /dir
          - name: liquibase-changlog-pv
            mountPath: /liquibase/changelogs
        command: ["/dir/k8-gh-testing/scripts/setup_and_run_liquibase.sh"]
      restartPolicy: Never

Calling Out My Template Changes

In my previous configurations, I had been using git-sync to constantly keep my repositories in sync with the local pods. This was done to be able to make live changes to websites by simply doing a git commit and push. The problem is that git-sync can take a little to sync the repo to our temporary directory and the other containers looking for the synced files and directories can fail. In my web server configurations, this is no problem because nginx restarts and comes up successfully once the sync is complete.

In this new setup, I need an execute once and done type of setup so this does not work as the liquibase pod simply fails if the files aren’t there because I’m calling a script from the repo. In order to fix this, I moved git-sync to an initContainer. The problem is that git-sync never completes because it’s constantly looking for changes to sync. In order to get around that, I add the GIT_SYNC_ONE_TIME environment variable to the git-sync container. This means it’ll start up, sync, and then give a Ready status allowing the liquibase container to start.

The Simple Shell Script

My setup_and_run_liquibase.sh script is super simple. I’m just using this to copy the changelog file onto my PVC from the repo so that liquibase can execute it.

#!/bin/bash

cp -rf /dir/k8-gh-testing/database_schemas/example-k8-pg/* /liquibase/changelogs/

liquibase update --changeLogFile=changelog.sql

Creating a Liquibase Changelog

I created a really simple changelog file that will update my database called changelog.sql

--liquibase formatted sql

--changeset scott.algatt:1
--rollback DROP TABLE person;
create table person (
    id int primary key,
    name varchar(50) not null,
    address1 varchar(50),
    address2 varchar(50),
    city varchar(30)
)

--changeset scott.algatt:2
--rollback DROP TABLE company;
create table company (
    id int primary key,
    name varchar(50) not null,
    address1 varchar(50),
    address2 varchar(50),
    city varchar(30)
)

--changeset other.dev:3
--rollback ALTER TABLE person DROP COLUMN country;
alter table person add column country varchar(2)

--changeset other.dev:4
--rollback ALTER TABLE person DROP COLUMN state;
alter table person add column state varchar(2)

--changeset other.dev:5
--rollback ALTER TABLE company DROP COLUMN country;
alter table company add column country varchar(2)

Creating a Github Action

I have all of the groundwork set and ready to go! The final step is to setup a Github Action to do everything for me. The Github Action is going to make use of the Kubernetes integration I blogged about in my Using Github to Manage Kubernetes post. I created a liquibase.yaml that looks like:

name: Postgres DB CI/DI Pipeline

on:
  push:
    branches:
      - main
    paths:
      - 'config/database-pipeline-liquibase_job.yaml'
      - 'scripts/setup_and_run_liquibase.sh'
      - 'database_schemas/example-k8-pg/changelog.sql'
      - '.github/workflows/liquibase.yaml'

jobs:
  test-liquibase-action:
    runs-on: ubuntu-latest
    name: Test Liquibase Action
    steps:
        - name: Checkout main
          uses: actions/checkout@v2

        # Install doctl.
        - name: Install doctl
          uses: digitalocean/action-doctl@v2
          with:
            token: ${{ secrets.DIGITALOCEAN_ACCESS_TOKEN }}
        
        - name: Save DigitalOcean kubeconfig with short-lived credentials
          run: doctl kubernetes cluster kubeconfig save --expiry-seconds 600 ${{ secrets.CLUSTER_NAME }}
        
        - name: Update database with changes using liquidbase
          run: kubectl patch -f $GITHUB_WORKSPACE/config/database-pipeline-liquibase_job.yaml -p "$(cat $GITHUB_WORKSPACE/config/database-pipeline-liquibase_job.yaml)" --dry-run=client -o yaml | kubectl replace --force -f -

This Action will run if I change the files associated with my deployments. The action will also get a token from my Kubernetes cluster and then force patch my job.

Doing a Commit and Seeing What Happens

I did my commit and you can see from the output below that my Action ran as expected

If I check my Kubernetes cluster, I can also see that the job executed to completion

% kubectl -n database-pipeline get pod
NAME                        READY   STATUS      RESTARTS   AGE
k8-pg-0                     1/1     Running     0          44h
pg-liquibase-job-v1-9722x   0/1     Completed   0          18s

I can also connect to the database and confirm that everything was created

/# PGPASSWORD='adminPassword1!' psql -h k8-pg.database-pipeline.svc.cluster.local -U dbuser -d k8-pg-db
psql (12.9 (Ubuntu 12.9-0ubuntu0.20.04.1), server 16.0)
WARNING: psql major version 12, server major version 16.
         Some psql features might not work.
Type "help" for help.

k8-pg-db=> \dt
                 List of relations
 Schema |         Name          | Type  |  Owner   
--------+-----------------------+-------+----------
 public | company               | table | postgres
 public | databasechangelog     | table | postgres
 public | databasechangeloglock | table | postgres
 public | person                | table | postgres
(4 rows)

k8-pg-db->    

I can also check the logs of the pod to see what was executed using kubectl

% kubectl -n database-pipeline logs pg-liquibase-job-v1-9722x -c liquibase
####################################################
##   _     _             _ _                      ##
##  | |   (_)           (_) |                     ##
##  | |    _  __ _ _   _ _| |__   __ _ ___  ___   ##
##  | |   | |/ _` | | | | | '_ \ / _` / __|/ _ \  ##
##  | |___| | (_| | |_| | | |_) | (_| \__ \  __/  ##
##  \_____/_|\__, |\__,_|_|_.__/ \__,_|___/\___|  ##
##              | |                               ##
##              |_|                               ##
##                                                ## 
##  Get documentation at docs.liquibase.com       ##
##  Get certified courses at learn.liquibase.com  ## 
##                                                ##
####################################################
Starting Liquibase at 16:09:30 (version 4.24.0 #14062 built at 2023-09-28 12:18+0000)
Liquibase Version: 4.24.0
Liquibase Open Source 4.24.0 by Liquibase
Database is up to date, no changesets to execute

UPDATE SUMMARY
Run:                          0
Previously run:               5
Filtered out:                 0
-------------------------------
Total change sets:            5

Liquibase command 'update' was executed successfully.

This is nice that we can see the changes were made but it would also be nice to have those changes logged to our commit as well.

Adding Some Additional Detail to our Commit

I’m going to add two more steps to the Github Action

  1. Monitor Job For Completion or Timeout
  2. Grab Execution Output From Liquibase

The first step in these additions will be to monitor the execution of our liquibase job. I’m setting a timeout of 90 seconds on this step to make sure the job completes within 90 seconds. You might need to adjust this to longer if you anticipate having jobs running longer.

The second step in these additions is to find the pod name from the job description information and then get the logs from the created pod.

Here is the updated liquibase.yaml Github Action file

name: Postgres DB CI/DI Pipeline

on:
  push:
    branches:
      - main
    paths:
      - 'config/database-pipeline-liquibase_job.yaml'
      - 'scripts/setup_and_run_liquibase.sh'
      - 'database_schemas/example-k8-pg/changelog.sql'
      - '.github/workflows/liquibase.yaml'

jobs:
  test-liquibase-action:
    runs-on: ubuntu-latest
    name: Test Liquibase Action
    steps:
        - name: Checkout main
          uses: actions/checkout@v2

        # Install doctl.
        - name: Install doctl
          uses: digitalocean/action-doctl@v2
          with:
            token: ${{ secrets.DIGITALOCEAN_ACCESS_TOKEN }}
        
        - name: Save DigitalOcean kubeconfig with short-lived credentials
          run: doctl kubernetes cluster kubeconfig save --expiry-seconds 600 ${{ secrets.CLUSTER_NAME }}
        
        - name: Update database with changes using liquidbase
          run: kubectl patch -f $GITHUB_WORKSPACE/config/database-pipeline-liquibase_job.yaml -p "$(cat $GITHUB_WORKSPACE/config/database-pipeline-liquibase_job.yaml)" --dry-run=client -o yaml | kubectl replace --force -f -

        - name: Monitor Job For Completion or Timeout
          run: kubectl -n database-pipeline wait --for=condition=complete --timeout=90s job/pg-liquibase-job-v1
  
        - name: Grab Execution Output From Liquibase
          run: kubectl -n database-pipeline logs -c liquibase $(kubectl -n database-pipeline describe job pg-liquibase-job-v1 |grep 'Created pod' |awk '{print $7}')

Now when we commit and the job executes, we see more output like the below screenshot

You could also choose to change the logging level and other details for the final step to get more details in the action logs.

Cleaning Up

If you take a look at the Kubernetes cluster, you’ll see that the pod created from our job doesn’t go away

% kubectl -n database-pipeline get pod
NAME                        READY   STATUS      RESTARTS   AGE
k8-pg-0                     1/1     Running     0          45h
pg-liquibase-job-v1-68bh5   0/1     Completed   0          11m

There’s no harm in leaving this stick around because it will be replaced every time we do another commit. Updating the job will simply terminate the old pod and deploy another one. Until we do a commit, the pod with a Complete status will sit around…forever…Again no harm but it triggers my random OCD. As of Kubernetes v1.23, there is a TTL that can be added to jobs called spec.ttlSecondsAfterFinished. I added this to my original job YAML as shown below

 apiVersion: batch/v1
kind: Job
metadata:
  name: pg-liquibase-job-v1
  namespace: database-pipeline
spec:
  ttlSecondsAfterFinished: 30
  template:
    spec:
      securityContext:
        fsGroup: 65533 # to make SSH key readable
      volumes:
...

This addition of the spec.ttlSecondsAfterFinished tells Kubernetes to remove the pod 30s AFTER each reaches the Completed status. I did another commit and watched kubectl like a hawk until the following

% kubectl -n database-pipeline get pod
NAME                        READY   STATUS      RESTARTS   AGE
k8-pg-0                     1/1     Running     0          45h
pg-liquibase-job-v1-9296q   0/1     Completed   0          45s


% kubectl -n database-pipeline get pod
NAME      READY   STATUS    RESTARTS   AGE
k8-pg-0   1/1     Running   0          45h

Sure enough, after about 30 seconds, the Completed pod was removed and is no longer listed. You could choose to leave the pod around for a little longer or remove it in a shorter period if you like too. Either way, this is a nice little addition to cleaning up our deploy.

Conclusion

This is a super simple configuration that will allow you to manage the single database that I created in the previous post. The step of adding the PVC doesn’t seem like a requirement after I created this whole configuration but it might be useful to keep around if you have liquibase running against multiple databases. For now, I’m going to leave it there in my environment until I fully understand liquibase.