I thought that it would be a good idea to do a quick blog post on how to build a ci/cd database pipeline. If you’ve been following my blog, you know that I’ve done quite a bit of tinkering with various automation tools. One thing that I’ve found is that automation helps to relieve me of redundant tasks. I’m able to free myself up for other work and try to automate more. Another benefit of automation is that others can leverage what you’ve created. This helps to make sure things are reproducible and consistent.

I’ve dealt with a number of different customers in my job and I’ve realized that there are different levels of sophistication when it comes to automation. Some customers don’t know where to begin (this is where I was at) and others are so far down the automation/devops route that it is crazy what they can accomplish with a single commit.

Where it Begins

I thought it was a good idea to start some posts about building an extensible pipeline that could be used as the initial steps for anyone that is looking to get started.

From what I’ve learned, you should separate your pipelines into at least two areas:

  1. DevOps : focused on managing infrastructure
  2. Developers : focused on managing the schemas and data within the infrastructure

As I build on this pipeline, we could introduce additional areas such as security and testing/QA but let’s keep it simple for now.

This separation in the pipeline will ensure that DevOps teams are not responsible for the data contained within the infrastructure. The developers will not be able to or have to change the underlying infrastructure.

This post is going to focus on the DevOps part of the pipeline because you can’t add any data unless there’s some type of infrastructure to run it on.

Getting Started

I’m going to run this in my Kubernetes environment so I’m going to make use of some of my existing knowledge from the Managing Your Terraform State File and Infrastructure as Code with Terraform and GitHub Actions: A Kubernetes Case Study posts. Please look these over as I’m going to make a few assumptions in this post that build off of the information in them.

My integrations into my existing Kubernetes cluster with Terraform and Github Actions above is the start of my pipeline. This ensures that everything I do is handled by this Github repository because it is my infrastructure related repository.

Configuring the Infrastructure in Terraform

Technically, I already started the pipeline by creating the Github repository and integrating with Kubernetes. Since I’ve done that, now I want to plan out my infrastructure.

I’ll start by hosting everything in a new namespace within the Kubernetes cluster by creating the following namespaces.tf file:

Setting Up the Namespace in Kubernetes

resource "kubernetes_namespace" "pipeline" {
  metadata {
    name = "database-pipeline"
  }
}

Configuring Persistent Storage for the Database

I also want to make sure my changes to the DB live on if something happens to the pod so I’ll create a new pvc.tf that defines a PVC that can be mounted by the postgres instance:

resource "kubernetes_persistent_volume_claim" "postgres-pv" {
  metadata {
    name = "postgres-db-pv"
    namespace = kubernetes_namespace.pipeline.metadata[0].name
  }
  
  spec {
    access_modes = ["ReadWriteOnce"]
    storage_class_name = "do-block-storage"
    resources {
      requests = {
        storage = "1Gi"
      }
    }
  }
}

Remember that I’m running in DigitalOcean hence my usage of the do-block-storage storage class for my PVC. This may vary depending upon your cloud provider.

Configuring the Database

For the database, I decided to use this postgresql module. This module deploys a Bitnami PostgreSQL server so it will allow me to build off of some of the knowledge gained in my Deploying a Sample PostgreSQL Database post. I created the following database.tf file to configure the module and deploy the database:

module "postgresql" {
  source  = "ballj/postgresql/kubernetes"
  version = "~> 1.2"
  namespace = kubernetes_namespace.pipeline.metadata[0].name
  object_prefix = "k8-pg"
  name = "k8-pg-db"
  pvc_name = kubernetes_persistent_volume_claim.postgres-pv.metadata[0].name
}

Deploying the Database and Testing

Since I have everything in my existing Terraform configuration, all the I need to do is commit my changes to my repo. I did this and watched the Github Action complete successfully.

% kubectl -n database-pipeline get secret k8-pg -o json | jq -r .data.password |base64 -d
I90mH96fy5S4XrgO

Don’t forget the -r in your jq command so that you get the raw value and not a quoted value that will make the base64 command fail. With that password in hand, I should be able to connect to the database

# psql -h k8-pg.database-pipeline.svc.cluster.local -U dbuser
Password for user dbuser: 
psql: error: FATAL:  password authentication failed for user "dbuser"

but that appears to have failed :facepalm:

Troubleshooting My Deployment

As you can see above, everything was setup and deployed but I cannot connect to my postgres database. I checked the logs of the postgres instance to see what happened:

2023-11-06 17:43:47.851 GMT [1321] DETAIL:  Password does not match for user "dbuser".
	Connection matched pg_hba.conf line 1: "host     all             all             0.0.0.0/0               md5"

Well that doesn’t right because I should be using the that password. Let’s try redeploying with Bitnami Debugging to see if I missed something

module "postgresql" {
  source  = "ballj/postgresql/kubernetes"
  version = "~> 1.2"
  namespace = kubernetes_namespace.pipeline.metadata[0].name
  object_prefix = "k8-pg"
  name = "k8-pg-db"
  pvc_name = kubernetes_persistent_volume_claim.postgres-pv.metadata[0].name
  env = {
    BITNAMI_DEBUG = true
  }
}

I added the environment variable to see what happens. After doing my commit, I watched the server start up and found nothing of use in the logs

% kubectl -n database-pipeline logs k8-pg-0 -f
postgresql 18:47:14.88 
postgresql 18:47:14.89 Welcome to the Bitnami postgresql container
postgresql 18:47:14.89 Subscribe to project updates by watching https://github.com/bitnami/containers
postgresql 18:47:14.89 Submit issues and feature requests at https://github.com/bitnami/containers/issues
postgresql 18:47:14.90 
postgresql 18:47:14.91 DEBUG ==> Configuring libnss_wrapper...
postgresql 18:47:14.92 INFO  ==> ** Starting PostgreSQL setup **
postgresql 18:47:14.95 INFO  ==> Validating settings in POSTGRESQL_* env vars..
postgresql 18:47:14.96 INFO  ==> Loading custom pre-init scripts...
postgresql 18:47:14.96 INFO  ==> Initializing PostgreSQL database...
postgresql 18:47:14.97 DEBUG ==> Ensuring expected directories/files exist...
postgresql 18:47:14.99 INFO  ==> pg_hba.conf file not detected. Generating it...
postgresql 18:47:14.99 INFO  ==> Generating local authentication configuration
postgresql 18:47:15.02 INFO  ==> Deploying PostgreSQL with persisted data...
postgresql 18:47:15.08 INFO  ==> Configuring replication parameters
postgresql 18:47:15.12 INFO  ==> Configuring fsync
postgresql 18:47:15.14 INFO  ==> Configuring synchronous_replication
postgresql 18:47:15.16 INFO  ==> Loading custom scripts...
postgresql 18:47:15.17 INFO  ==> Enabling remote connections
postgresql 18:47:15.18 INFO  ==> ** PostgreSQL setup finished! **

postgresql 18:47:15.21 INFO  ==> ** Starting PostgreSQL **

Attempting to authenticate, I’m still getting the same errors. I also tried adding a new env variable to force set the postgres user’s password

...
   env = {
     BITNAMI_DEBUG = true
     POSTGRESQL_POSTGRES_PASSWORD = "justSomeTestPassword1"
   }

After committing this, I’m still getting an error that the postgres user does not have a password. I noticed that the image_tag is defaulting to Postgres 13 so I tried changing to Postgres 16 with a new image_tag

module "postgresql" {
  source  = "ballj/postgresql/kubernetes"
  version = "~> 1.2"
  namespace = kubernetes_namespace.pipeline.metadata[0].name
  object_prefix = "k8-pg"
  name = "k8-pg-db"
  pvc_name = kubernetes_persistent_volume_claim.postgres-pv.metadata[0].name
  image_tag = "16.0.0"
  env = {
    BITNAMI_DEBUG = true
    POSTGRESQL_POSTGRES_PASSWORD = "justSomeTestPassword1"
  }
}

Now when I deployed, I get the following error in the logs

% kubectl -n database-pipeline logs k8-pg-0 -f
postgresql 18:47:14.88 
postgresql 18:47:14.89 Welcome to the Bitnami postgresql container
postgresql 18:47:14.89 Subscribe to project updates by watching https://github.com/bitnami/containers
postgresql 18:47:14.89 Submit issues and feature requests at https://github.com/bitnami/containers/issues
postgresql 18:47:14.90 
postgresql 18:47:14.91 DEBUG ==> Configuring libnss_wrapper...
postgresql 18:47:14.92 INFO  ==> ** Starting PostgreSQL setup **
postgresql 18:47:14.95 INFO  ==> Validating settings in POSTGRESQL_* env vars..
postgresql 18:47:14.96 INFO  ==> Loading custom pre-init scripts...
postgresql 18:47:14.96 INFO  ==> Initializing PostgreSQL database...
postgresql 18:47:14.97 DEBUG ==> Ensuring expected directories/files exist...
postgresql 18:47:14.99 INFO  ==> pg_hba.conf file not detected. Generating it...
postgresql 18:47:14.99 INFO  ==> Generating local authentication configuration
postgresql 18:47:15.02 INFO  ==> Deploying PostgreSQL with persisted data...
postgresql 18:47:15.08 INFO  ==> Configuring replication parameters
postgresql 18:47:15.12 INFO  ==> Configuring fsync
postgresql 18:47:15.14 INFO  ==> Configuring synchronous_replication
postgresql 18:47:15.16 INFO  ==> Loading custom scripts...
postgresql 18:47:15.17 INFO  ==> Enabling remote connections
postgresql 18:47:15.18 INFO  ==> ** PostgreSQL setup finished! **

postgresql 18:47:15.21 INFO  ==> ** Starting PostgreSQL **
2023-11-06 18:47:15.277 GMT [1] FATAL:  database files are incompatible with server
2023-11-06 18:47:15.277 GMT [1] DETAIL:  The data directory was initialized by PostgreSQL version 13, which is not compatible with this version 16.0.

I know the problem! The good news is that it looks like my PVC is working as expected. I’ve properly mounted the data file and the PVC appears to be keeping all of the data each time I deploy. The problem is that I deployed this postgres instance once before with the default settings and then destroyed everything. The problem with doing this is that my PVC saved the changes from that original creation and now that configuration is locked in stone on the PVC.

Fixing My Problem

I commented out all of the lines in my pvc.tf and database.tf files and redeployed to blow everything away. I did another deploy and watched the logs for the postgres instance

 % kubectl -n database-pipeline logs k8-pg-0 -f
postgresql 19:00:16.51 
postgresql 19:00:16.51 Welcome to the Bitnami postgresql container
postgresql 19:00:16.51 Subscribe to project updates by watching https://github.com/bitnami/containers
postgresql 19:00:16.52 Submit issues and feature requests at https://github.com/bitnami/containers/issues
postgresql 19:00:16.52 
postgresql 19:00:16.53 DEBUG ==> Configuring libnss_wrapper...
postgresql 19:00:16.56 INFO  ==> ** Starting PostgreSQL setup **
postgresql 19:00:16.59 INFO  ==> Validating settings in POSTGRESQL_* env vars..
postgresql 19:00:16.60 INFO  ==> Loading custom pre-init scripts...
postgresql 19:00:16.61 INFO  ==> Initializing PostgreSQL database...
postgresql 19:00:16.61 DEBUG ==> Ensuring expected directories/files exist...
postgresql 19:00:16.64 INFO  ==> pg_hba.conf file not detected. Generating it...
postgresql 19:00:16.65 INFO  ==> Generating local authentication configuration
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /bitnami/postgresql/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok


Success. You can now start the database server using:

    /opt/bitnami/postgresql/bin/pg_ctl -D /bitnami/postgresql/data -l logfile start

initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.
postgresql 19:00:17.99 INFO  ==> Starting PostgreSQL in background...
waiting for server to start....2023-11-06 19:00:18.048 GMT [66] LOG:  starting PostgreSQL 16.0 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-11-06 19:00:18.052 GMT [66] LOG:  listening on IPv6 address "::1", port 5432
2023-11-06 19:00:18.052 GMT [66] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2023-11-06 19:00:18.058 GMT [66] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2023-11-06 19:00:18.069 GMT [69] LOG:  database system was shut down at 2023-11-06 19:00:17 GMT
2023-11-06 19:00:18.079 GMT [66] LOG:  database system is ready to accept connections
 done
server started
CREATE DATABASE
postgresql 19:00:18.25 INFO  ==> Changing password of postgres
ALTER ROLE
postgresql 19:00:18.29 INFO  ==> Creating user dbuser
CREATE ROLE
postgresql 19:00:18.32 INFO  ==> Granting access to "dbuser" to the database "k8-pg-db"
GRANT
ALTER DATABASE
postgresql 19:00:18.36 INFO  ==> Setting ownership for the 'public' schema database "k8-pg-db" to "dbuser"
ALTER SCHEMA
postgresql 19:00:18.41 INFO  ==> Configuring replication parameters
postgresql 19:00:18.46 INFO  ==> Configuring synchronous_replication
postgresql 19:00:18.47 INFO  ==> Configuring fsync
postgresql 19:00:18.50 INFO  ==> Stopping PostgreSQL...
waiting for server to shut down...2023-11-06 19:00:18.515 GMT [66] LOG:  received fast shutdown request
.2023-11-06 19:00:18.519 GMT [66] LOG:  aborting any active transactions
2023-11-06 19:00:18.523 GMT [66] LOG:  background worker "logical replication launcher" (PID 72) exited with exit code 1
2023-11-06 19:00:18.524 GMT [67] LOG:  shutting down
2023-11-06 19:00:18.526 GMT [67] LOG:  checkpoint starting: shutdown immediate
2023-11-06 19:00:18.583 GMT [67] LOG:  checkpoint complete: wrote 932 buffers (5.7%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.025 s, sync=0.024 s, total=0.060 s; sync files=308, longest=0.007 s, average=0.001 s; distance=4257 kB, estimate=4257 kB; lsn=0/190EF98, redo lsn=0/190EF98
2023-11-06 19:00:18.594 GMT [66] LOG:  database system is shut down
 done
server stopped
postgresql 19:00:18.62 INFO  ==> Loading custom scripts...
postgresql 19:00:18.62 INFO  ==> Enabling remote connections
postgresql 19:00:18.64 INFO  ==> ** PostgreSQL setup finished! **

postgresql 19:00:18.68 INFO  ==> ** Starting PostgreSQL **
2023-11-06 19:00:18.714 GMT [1] LOG:  pgaudit extension initialized
2023-11-06 19:00:18.733 GMT [1] LOG:  starting PostgreSQL 16.0 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-11-06 19:00:18.735 GMT [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-11-06 19:00:18.736 GMT [1] LOG:  listening on IPv6 address "::", port 5432
2023-11-06 19:00:18.745 GMT [1] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2023-11-06 19:00:18.755 GMT [146] LOG:  database system was shut down at 2023-11-06 19:00:18 GMT
2023-11-06 19:00:18.768 GMT [1] LOG:  database system is ready to accept connections
2023-11-06 19:01:11.410 GMT [180] FATAL:  password authentication failed for user "postgres"
2023-11-06 19:01:11.410 GMT [180] DETAIL:  Connection matched file "/opt/bitnami/postgresql/conf/pg_hba.conf" line 1: "host     all             all             0.0.0.0/0               md5"

So far that looks a lot better. Let’s try logging into the database.

# export PGPASSWORD="justSomeTestPassword1"
root@ubuntu-65b8d8bdcb-w5tbn:/# psql -h k8-pg.database-pipeline.svc.cluster.local -U postgres
psql (12.9 (Ubuntu 12.9-0ubuntu0.20.04.1), server 16.0)
WARNING: psql major version 12, server major version 16.
         Some psql features might not work.
Type "help" for help.

postgres=# 

WOO HOO! Now it worked as I expected.

Why Did This Happen?

If you look at the Bitnami documentation, you’ll find a useful bit of information that simply mentions that some of the environment variables are used on the FIRST deployment of the image. I was assuming that every time I deployed was the “first deployment” since I kept removing the stateful set and changing it. The “problem” was that the PVC was retaining the original deployment. With every deployment that I did, the postgres server was mounting the PVC and using the configuration from the very first deployment.

Finalizing My Deployment

I then did another commit to remove the database and pvc. I also didn’t want a chance of my passwords showing up in the state file so I first created a separate secret in Kubernetes by first creating the below postgres-credentials.yaml file

apiVersion: v1
kind: Secret
metadata:
  name: postgres-credentials
type: Opaque
stringData:
  password: "userPassword1!"
  postgres-password: "adminPassword1!"

I then added this secret to my Kubernetes

 % kubectl -n database-pipeline apply -f postgres-credentials.yaml 
secret/postgres-credentials created

From there, I changed my database.tf file to leverage the values in the secret for my passwords

module "postgresql" {
  source  = "ballj/postgresql/kubernetes"
  version = "~> 1.2"
  namespace = kubernetes_namespace.pipeline.metadata[0].name
  object_prefix = "k8-pg"
  name = "k8-pg-db"
  pvc_name = kubernetes_persistent_volume_claim.postgres-pv.metadata[0].name
  image_tag = "16.0.0"
  env_secret = [
    {
        name = "POSTGRES_POSTGRES_PASSWORD"
        secret = "postgres-credentials"
        key = "postgres-password"
    },
    {
        name = "POSTGRES_PASSWORD"
        secret = "postgres-credentials"
        key = "password"
    }
  ]
}

Another commit and we check our work after the deployment completes

# export PGPASSWORD="adminPassword1!"
# psql -h k8-pg.database-pipeline.svc.cluster.local -U postgres -d k8-pg-db
psql (12.9 (Ubuntu 12.9-0ubuntu0.20.04.1), server 16.0)
WARNING: psql major version 12, server major version 16.
         Some psql features might not work.
Type "help" for help.

k8-pg-db=# 

Looks like it works! I did notice a bug here in that the POSTGRES_PASSWORD that I specified in my deployment was ignored. The module did not honor this and decided to generate its own password for my dbuser. This just means that I ned to get the secret again above using kubectl and jq. The good news is that the password retrieved from the secret worked as expected.

Wrapping Up

It looks like I’ve now got a working infrastructure pipeline setup that could be reused for future deployments. There’s a small bug in the password for the dbuser being ignored but this I can deal with. The next step would be to give the dbuser and the database details over to a DBA and let them begin configuring. This will be my next post!