From Scattered to Splunk: Bringing Order to Your Kubernetes Logs

In my Silence Not Golden: Fixing the Mute Button on Tetragon Logs post, I realized that I was an idiot. I also realized that I needed to get all of my Kubernetes into one place for a number of reasons. I’m most familiar with Splunk so it makes sense to centralize my Kubernetes logs with Splunk.

In this post, I’m going to walk through configuring Splunk and Kubernetes so that all of my logs are in a central location. I’m not going to include setting up a Splunk instance so I’ll assume that you have deployed Splunk in Cloud or some version of Enterprise.

Setting Up Your Splunk Instance

If you are familiar with Splunk, this part is pretty easy. If you are not familiar with Splunk, then there’s already some really good documentation online. My configuration is going to make use of a Splunk HTTP Event Collector (HEC). The link that I referenced provides a ton of details on configuring the Splunk HEC and obtaining a token to be used for this integration so I won’t cover that in this post either.

The most important things that you’ll need to move forward are:

  • Splunk HEC Address : This is typically https://<your_splunk_hostname_or_ip>:8088/services/collector. If you have not enabled TLS/HTTPS for your HEC, then it will be http instead of https.
  • Splunk HEC Token : This will be some UUID looking string of letters and numbers

Once you have these items, you can continue onto the rest of the instructions.

Setting Up Kubernetes

After looking around for ways to get logs into Splunk, I discovered the Splunk OpenTelemetry Collector from SignalFX. In order to make deployment on Kubernetes easier, they have a Helm chart for deployment so we’ll use this for deployment.

Creating a Values File for the Helm Chart

I’m going to first create a basic values.yaml file to be used for the deployment:

clusterName: do-k8
splunkPlatform:
  endpoint: https://<my_splunk>:8088/services/collector
  token: xxxxxxxxx
  index: kubernetes
  insecureSkipVerify: true

This file includes the very basic entries that I’ll need for deployment.

I’m using the clusterName entry to tag all of my logs with the cluster. I’m doing this to future proof my deployment in case I configure additional clusters to log to my Splunk.

The splunkPlatform parameters are needed to configure the OpenTelemetry Collector. As mentioned previously, I’ve got my Splunk server, splunkPlatform.endpoint, and the HEC token, splunkPlatform.token. I’ve configured my HEC to use the kubernetes index but I’m also telling the OpenTelemetry Collector that I want it to send to the kubernetes index. My Splunk is using a self signed certificate so I’m also using the splunkPlatform.insecureSkipVerify setting to disable certificate validation.

This is all that I’m including in my values.yaml file for now so that I can get things deployed. There are a bunch of other options you can configure via the values.yaml file and I do suggest you check them out.

Using Helm to Deploy to Kubernetes

Now that I have my values.yaml file, I’ll use helm to deploy the OpenTelemetry Collector:

 % helm repo add splunk-otel-collector-chart https://signalfx.github.io/splunk-otel-collector-chart
"splunk-otel-collector-chart" has been added to your repositories

% helm install my-splunk-otel-collector -f values.yaml splunk-otel-collector-chart/splunk-otel-collector
NAME: my-splunk-otel-collector
LAST DEPLOYED: Wed Jan 17 10:27:10 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Splunk OpenTelemetry Collector is installed and configured to send data to Splunk Platform endpoint "https://my_splunk:8088/services/collector".

After deployment, I logged into my Splunk and did a quick search on the kubernetes index to see if it’s working

I am getting logs from my Kubernetes cluster! I’m getting lots of logs! The good news is that I already see logs coming from Tetragon. If you look at the INTERESTING FIELDS on the left side of the screenshot, you’ll also see node_name with a value of 3. This node_name field is coming from my Tetragon logs and I have 3 nodes in my Kubernetes cluster.

The good news is that I’m getting the logs that I want. The bad news is that I’m getting all kinds of other logs too.

Tuning the Kubernetes Logs Shipped to Splunk

Now that I have centralized my Kubernetes logs with Splunk, I need to work on filtering out logs so that I don’t tip over my Splunk or kill my Splunk license. The first step in this process is to decide where I can do the filtering. Ideally, I would prefer to configure the OpenTelemetry Collector to only send me logs that I care about.

After looking at the documentation here, I found that there is a logCollection setting in the values.yaml file that will control what logs are collected. In looking over the values.yaml file in the Github repo, I found that I can tell the Collector to ignore certain log paths.

Using Splunk to Find the Noisy Containers

My first thought was to search Splunk to find my noisy sources. The Collector tags all of the logs with a source that defines the log path so the below Splunk query is a great way to begin tuning:

index=kubernetes |stats count by source |sort -count

This query runs against my kubernetes index counting the number of logs per source and then sorting those counts in descending order. The query produces something like the below

Unfortunately, it looks like my Tetragon containers are the noisy little things in all of this. That’s ok. I’m taking baby steps here so I’m first going to tune them out because I have some additional ideas for making them more manageable in the future.

Updating my Helm Deployment

I’m going to update my previous values.yaml file to ignore the Tetragon container logs for now. I’m actually going to just ignore the whole kube-system namespace for now:

clusterName: do-k8
splunkPlatform:
  endpoint: https://my_splunk:8088/services/collector
  token: xxxxxxx
  index: kubernetes
  insecureSkipVerify: true
logsCollection:
  containers:
    excludePaths: ["/var/log/pods/kube-system_*/*/*.log"]

After updating the YAML, I did an upgrade to redeploy my changes

helm upgrade my-splunk-otel-collector -f values.yaml splunk-otel-collector-chart/splunk-otel-collector
Release "my-splunk-otel-collector" has been upgraded. Happy Helming!
NAME: my-splunk-otel-collector
LAST DEPLOYED: Wed Jan 17 10:33:11 2024
NAMESPACE: default
STATUS: deployed
REVISION: 2
TEST SUITE: None
NOTES:
Splunk OpenTelemetry Collector is installed and configured to send data to Splunk Platform endpoint "https://my_splunk:8088/services/collector".

Now, when I look at my Splunk, I’m getting a lot less logs so yay.

Conclusion

The good news is that I was able to centralize Kubernetes logs with Splunk. While centralizing my logs was the original intent of the post, I didn’t solve my original reason for centralizing the logs. I successfully tuned out my Tetragon logs so they aren’t being shipped to my Splunk. I’m ok with this for now because I know that I have some tuning to do with Tetragon before I let those logs back into Splunk.

I have some of the pieces all coming together. I’m getting some observability into my Kubernetes which is a huge success. I’ve also got a way to send that observability into something I’m comfortable with using, Splunk. The next steps will be to do some additional tuning in Tetragon to make sure I’m only looking for events that I care about. After I have those events pruned a little better, I can allow the logs back into my Splunk. From there, I hope that I can do some policies in Tetragon to control what is possible.

…More to come!