In my Silence Not Golden: Fixing the Mute Button on Tetragon Logs post, I realized that I was an idiot. I also realized that I needed to get all of my Kubernetes into one place for a number of reasons. I’m most familiar with Splunk so it makes sense to centralize my Kubernetes logs with Splunk.
In this post, I’m going to walk through configuring Splunk and Kubernetes so that all of my logs are in a central location. I’m not going to include setting up a Splunk instance so I’ll assume that you have deployed Splunk in Cloud or some version of Enterprise.
Setting Up Your Splunk Instance
If you are familiar with Splunk, this part is pretty easy. If you are not familiar with Splunk, then there’s already some really good documentation online. My configuration is going to make use of a Splunk HTTP Event Collector (HEC). The link that I referenced provides a ton of details on configuring the Splunk HEC and obtaining a token to be used for this integration so I won’t cover that in this post either.
The most important things that you’ll need to move forward are:
- Splunk HEC Address : This is typically
https://<your_splunk_hostname_or_ip>:8088/services/collector
. If you have not enabled TLS/HTTPS for your HEC, then it will behttp
instead ofhttps
. - Splunk HEC Token : This will be some UUID looking string of letters and numbers
Once you have these items, you can continue onto the rest of the instructions.
Setting Up Kubernetes
After looking around for ways to get logs into Splunk, I discovered the Splunk OpenTelemetry Collector from SignalFX. In order to make deployment on Kubernetes easier, they have a Helm chart for deployment so we’ll use this for deployment.
Creating a Values File for the Helm Chart
I’m going to first create a basic values.yaml file to be used for the deployment:
clusterName: do-k8
splunkPlatform:
endpoint: https://<my_splunk>:8088/services/collector
token: xxxxxxxxx
index: kubernetes
insecureSkipVerify: true
This file includes the very basic entries that I’ll need for deployment.
I’m using the clusterName
entry to tag all of my logs with the cluster. I’m doing this to future proof my deployment in case I configure additional clusters to log to my Splunk.
The splunkPlatform
parameters are needed to configure the OpenTelemetry Collector. As mentioned previously, I’ve got my Splunk server, splunkPlatform.endpoint
, and the HEC token, splunkPlatform.token
. I’ve configured my HEC to use the kubernetes
index but I’m also telling the OpenTelemetry Collector that I want it to send to the kubernetes
index. My Splunk is using a self signed certificate so I’m also using the splunkPlatform.insecureSkipVerify
setting to disable certificate validation.
This is all that I’m including in my values.yaml
file for now so that I can get things deployed. There are a bunch of other options you can configure via the values.yaml file and I do suggest you check them out.
Using Helm to Deploy to Kubernetes
Now that I have my values.yaml
file, I’ll use helm to deploy the OpenTelemetry Collector:
% helm repo add splunk-otel-collector-chart https://signalfx.github.io/splunk-otel-collector-chart
"splunk-otel-collector-chart" has been added to your repositories
% helm install my-splunk-otel-collector -f values.yaml splunk-otel-collector-chart/splunk-otel-collector
NAME: my-splunk-otel-collector
LAST DEPLOYED: Wed Jan 17 10:27:10 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Splunk OpenTelemetry Collector is installed and configured to send data to Splunk Platform endpoint "https://my_splunk:8088/services/collector".
After deployment, I logged into my Splunk and did a quick search on the kubernetes
index to see if it’s working
I am getting logs from my Kubernetes cluster! I’m getting lots of logs! The good news is that I already see logs coming from Tetragon. If you look at the INTERESTING FIELDS
on the left side of the screenshot, you’ll also see node_name
with a value of 3
. This node_name
field is coming from my Tetragon logs and I have 3 nodes in my Kubernetes cluster.
The good news is that I’m getting the logs that I want. The bad news is that I’m getting all kinds of other logs too.
Tuning the Kubernetes Logs Shipped to Splunk
Now that I have centralized my Kubernetes logs with Splunk, I need to work on filtering out logs so that I don’t tip over my Splunk or kill my Splunk license. The first step in this process is to decide where I can do the filtering. Ideally, I would prefer to configure the OpenTelemetry Collector to only send me logs that I care about.
After looking at the documentation here, I found that there is a logCollection
setting in the values.yaml
file that will control what logs are collected. In looking over the values.yaml file in the Github repo, I found that I can tell the Collector to ignore certain log paths.
Using Splunk to Find the Noisy Containers
My first thought was to search Splunk to find my noisy sources. The Collector tags all of the logs with a source
that defines the log path so the below Splunk query is a great way to begin tuning:
index=kubernetes |stats count by source |sort -count
This query runs against my kubernetes
index counting the number of logs per source
and then sorting those counts in descending order. The query produces something like the below
Unfortunately, it looks like my Tetragon containers are the noisy little things in all of this. That’s ok. I’m taking baby steps here so I’m first going to tune them out because I have some additional ideas for making them more manageable in the future.
Updating my Helm Deployment
I’m going to update my previous values.yaml
file to ignore the Tetragon container logs for now. I’m actually going to just ignore the whole kube-system
namespace for now:
clusterName: do-k8
splunkPlatform:
endpoint: https://my_splunk:8088/services/collector
token: xxxxxxx
index: kubernetes
insecureSkipVerify: true
logsCollection:
containers:
excludePaths: ["/var/log/pods/kube-system_*/*/*.log"]
After updating the YAML, I did an upgrade to redeploy my changes
helm upgrade my-splunk-otel-collector -f values.yaml splunk-otel-collector-chart/splunk-otel-collector
Release "my-splunk-otel-collector" has been upgraded. Happy Helming!
NAME: my-splunk-otel-collector
LAST DEPLOYED: Wed Jan 17 10:33:11 2024
NAMESPACE: default
STATUS: deployed
REVISION: 2
TEST SUITE: None
NOTES:
Splunk OpenTelemetry Collector is installed and configured to send data to Splunk Platform endpoint "https://my_splunk:8088/services/collector".
Now, when I look at my Splunk, I’m getting a lot less logs so yay.
Conclusion
The good news is that I was able to centralize Kubernetes logs with Splunk. While centralizing my logs was the original intent of the post, I didn’t solve my original reason for centralizing the logs. I successfully tuned out my Tetragon logs so they aren’t being shipped to my Splunk. I’m ok with this for now because I know that I have some tuning to do with Tetragon before I let those logs back into Splunk.
I have some of the pieces all coming together. I’m getting some observability into my Kubernetes which is a huge success. I’ve also got a way to send that observability into something I’m comfortable with using, Splunk. The next steps will be to do some additional tuning in Tetragon to make sure I’m only looking for events that I care about. After I have those events pruned a little better, I can allow the logs back into my Splunk. From there, I hope that I can do some policies in Tetragon to control what is possible.
…More to come!