Cribl Splunk_HEC Datasource: The Ultimate Guide for Kubernetes Log Ingestion

After writing my From Scattered to Splunk: Bringing Order to Your Kubernetes Logs post, a buddy of mine Danny Ansell (oh by the way he does work for Cribl and used to work for Splunk) suggested that I could always import my logs into Cribl as well. I’m madly in love with Splunk and do so love one of their tag lines Splunk Rhymes with Drunk, but I’m always on the hunt for kicking the tires on new technology. In this post, I plan to configure the Cribl splunk_hec for Kubernetes logs.

After receiving this “challenge” from Danny, I was curious about the cost of Cribl. As it turns out, Cribl has a free tier that allows for 1TB/day of data. The Splunk free license allows for 500GB/day so this is already a win.

Getting Started

Getting started was super easy. I just signed up for an account on Cribl’s site and after some Email verifications I was ready to go. When I first logged in, I saw this page

The Cribl Splunk_HEC Data Source

I was very happy to see this option since I already configured my Kubernetes cluster to send logs to Splunk using the Splunk OpenTelemetry Collector from SignalFX. I went into the in_splunk_hec and found the following:

Reconfiguring My Kubernetes Deployment

With my new “Splunk” details, I updated my YAML file with the new details:

clusterName: do-k8
splunkPlatform:
  endpoint: https://default.main.something.cribl.cloud:8088
  token: my-cribl-splunk-hec-token
  index: kubernetes
  insecureSkipVerify: true
logsCollection:
  containers:
    excludePaths: []

You’ll notice that I removed my previous excludePaths because I wanted this to be noisy so that I could make sure I’m getting logs.

% helm upgrade my-splunk-otel-collector -f values.yaml splunk-otel-collector-chart/splunk-otel-collector
Release "my-splunk-otel-collector" has been upgraded. Happy Helming!
NAME: my-splunk-otel-collector
LAST DEPLOYED: Sun Feb  4 10:13:05 2024
NAMESPACE: default
STATUS: deployed
REVISION: 10
TEST SUITE: None
NOTES:
Splunk OpenTelemetry Collector is installed and configured to send data to Splunk Platform endpoint "https://default.main.something.cribl.cloud:8088".

I then checked my deployment to make sure everything was up and running:

% kubectl get pod
NAME                                   READY   STATUS    RESTARTS       AGE
my-splunk-otel-collector-agent-4rl7x   1/1     Running   0              52s
my-splunk-otel-collector-agent-9t4lg   1/1     Running   0              56s
my-splunk-otel-collector-agent-vt4lw   1/1     Running   0              60s

Yup! I’ve got new pods up and running. This is great but what should I do next?

Setting Up Cribl to Handle Kubernetes Logs

Confirming the Cribl Data Source

I wanted to make sure that I was getting logs into Cribl. I saw no errors in my Kubernetes pods so this was a good sign. I went back to the Getting Started page shown above and click on Manage Streams. This results in a page similar to the below screen:

Creating a Pipeline for Kubernetes Logs

With data flowing into Cribl, it just goes to devnull by default so we bring data in and dump it on the floor. In order to begin handling the data, I needed to create a pipeline for my data. Back on the Stream management page, I went to Pipelines located under the Processing menu item.

On the Pipelines page, I clicked the Add Pipeline button. I’m trying to keep things super simple here so I just called the pipeline kuberneteslogs and hit Save.

Creating a Route for Kubernetes Logs

In order to connect this to a route, I click the Attach Route button which displays a blank page because I have not created any routes. I click the Add Route button. From there, I accepted the defaults and gave it a name I can recognize.

Committing Our Changes

Since I’m running in the cloud version, I need to do a Commit and Deploy whenever I make changes. With all of the changes I’ve made, I think it’s a good idea to do this. I also need to do this so the data source sends logs through my pipeline. In order to commit the changes, click the Commit & Deploy button in the upper right corner of the screen. You will be asked to enter some details about the commit and then you can proceed.

Verifying Our Changes

After I did the commit and deploy, I wanted to make sure I was getting logs into my pipeline. I opened the Monitoring menu and then from the Data drop down, I clicked on Routes to review a page similar to the below.

Doing Something With Your Logs

Cribl does not specifically have a search function like Splunk or other SIEMs. Yes, I know it “can” be setup to search object stores like S3 but I’m not interested in storing my logs in S3 right now. I’m going to ship my logs back to my Splunk so that I can do searching and more.

Getting Your Cribl Egress IP Address(es)

In the event that you are a sometimes paranoid security dweeb, you too can lock down your infrastructure. Cribl has a list of Egress IP Addresses in your cloud deployment so that you can use these in your Ingress rules. When you first login to Cribl, there’s an Access Details button at the very top. When you click this, you will reveal a bunch of different information so I won’t show it but you will find your Egress IP Addresses listed.

Configure Cribl for My Splunk HEC

Now that I’ve got my Cribl Egress IP added to my Ingress rules for Splunk, I need to create a new destination in Cribl for my Splunk instance. In order to do this, I’ll go back to managing my Cribl Stream worker and open up the Destinations

Creating A New Route to My Splunk

My plan is to use Cribl to only send logs of interest to my Splunk for future use. For starters, I’m going to start sending my web server logs by updating my KubernetesLogs Route. If you look at the Routes page, you’ll see that it already has a endRoute that dumps logs to devnull so our previous route is redundant. So that I only send logs of interest at my Splunk, I need to setup a filter in Cribl on this route as well. My filter looks something like the below:

sourcetype==="kube:container:nginx" && source.includes("wordpress") && !(_raw.includes("system-probe") || _raw.includes("Uptime Monitor"))

This filter is looking for any logs coming from my nginx containers that include wordpress. In addition to this inclusion, I’m looking to exclude any of the server probes that I have coming at my web servers. With this in place, I should be able to reduce some of the log noise coming from those monitors.

After adding the filter, I’m changing the destination from devnull:devnull to my splunk_hec:DigitalOceanSplunk. I also make sure Final is selected because I don’t want these logs flowing into future routes.

I did a Save and then another Commit And Deploy. Let’s see if we’re getting web server logs…Success!

Wrapping it All Up

I set out to configure the Cribl splunk_hec for kubernetes logs and I was successful. I’ve now got a flow of logs from my Kubernetes cluster into Cribl. In Cribl, I’m seeing all logs but I’m filtering out things that are not important to me. From there, I’m sending logs that I care about to my Splunk.

Thanks Danny!