Cribl Splunk_HEC Datasource: The Ultimate Guide for Kubernetes Log Ingestion

Facebook Tweet Pin LinkedIn

After writing my From Scattered to Splunk: Bringing Order to Your Kubernetes Logs post, a buddy of mine Danny Ansell (oh by the way he does work for Cribl and used to work for Splunk) suggested that I could always import my logs into Cribl as well. I’m madly in love with Splunk and do so love one of their tag lines Splunk Rhymes with Drunk, but I’m always on the hunt for kicking the tires on new technology. In this post, I plan to configure the Cribl splunk_hec for Kubernetes logs.

After receiving this “challenge” from Danny, I was curious about the cost of Cribl. As it turns out, Cribl has a free tier that allows for 1TB/day of data. The Splunk free license allows for 500GB/day so this is already a win.

Getting Started

Getting started was super easy. I just signed up for an account on Cribl’s site and after some Email verifications I was ready to go. When I first logged in, I saw this page

Get Data Into Cribl seemed like a great place to start! After opening that page, I noticed that there was an ACL link that brought up the below page.

The original configuration just had 0.0.0.0/0 which meant any source could send logs into my Cribl instance. I didn’t want this so I just locked it down to my Kubernetes nodes and my ISP’s IP. The Data Sources page seemed like a great next step in the process. This link revealed the below page (I’ve trimmed some of the page off so you don’t see my instance information – sorry):

There are a bunch of options on this page. If you notice, there’s an in_splunk_hec Data Source. As I noted before, I didn’t include all of the details but just after the TLS you’ll find your Splunk HEC URL with port 8088 as well!

The Cribl Splunk_HEC Data Source

I was very happy to see this option since I already configured my Kubernetes cluster to send logs to Splunk using the Splunk OpenTelemetry Collector from SignalFX. I went into the in_splunk_hec and found the following:

These are just some basic settings so there’s nothing to configure here. I noticed the Auth Token link so I clicked on that

This is where I found the Splunk HEC token for my configuration. I now have my Splunk HEC URL and Token so the next step was to reconfigure my OpenTelemetry deployment.

Reconfiguring My Kubernetes Deployment

With my new “Splunk” details, I updated my YAML file with the new details:

clusterName: do-k8
splunkPlatform:
  endpoint: https://default.main.something.cribl.cloud:8088
  token: my-cribl-splunk-hec-token
  index: kubernetes
  insecureSkipVerify: true
logsCollection:
  containers:
    excludePaths: []

You’ll notice that I removed my previous excludePaths because I wanted this to be noisy so that I could make sure I’m getting logs.

% helm upgrade my-splunk-otel-collector -f values.yaml splunk-otel-collector-chart/splunk-otel-collector
Release "my-splunk-otel-collector" has been upgraded. Happy Helming!
NAME: my-splunk-otel-collector
LAST DEPLOYED: Sun Feb  4 10:13:05 2024
NAMESPACE: default
STATUS: deployed
REVISION: 10
TEST SUITE: None
NOTES:
Splunk OpenTelemetry Collector is installed and configured to send data to Splunk Platform endpoint "https://default.main.something.cribl.cloud:8088".

I then checked my deployment to make sure everything was up and running:

% kubectl get pod
NAME                                   READY   STATUS    RESTARTS       AGE
my-splunk-otel-collector-agent-4rl7x   1/1     Running   0              52s
my-splunk-otel-collector-agent-9t4lg   1/1     Running   0              56s
my-splunk-otel-collector-agent-vt4lw   1/1     Running   0              60s

Yup! I’ve got new pods up and running. This is great but what should I do next?

Setting Up Cribl to Handle Kubernetes Logs

Confirming the Cribl Data Source

I wanted to make sure that I was getting logs into Cribl. I saw no errors in my Kubernetes pods so this was a good sign. I went back to the Getting Started page shown above and click on Manage Streams. This results in a page similar to the below screen:

I clicked on the default worker pane to reveal the following screen:

At this point, I still know nothing so I started looking at the menu items on this page. Under the Data menu, there is an option called Sources that shows the below page:

There’s my Splunk HEC data source! I decided to click this tile and got the following page:

So far, I’ve made progress around the site but it’s not super useful to know whether I’m getting logs into Cribl or not. I do see that little Live button on the right side of the page. I wonder what it does?

Success! This is what I was looking for! On this page, I can see that my logs are coming into Cribl. I can also click on the Charts link and I see some useful information regarding data coming in:

I’m off to a good start! I’ve confirmed that logs are coming into Cribl but what should I do next?

Creating a Pipeline for Kubernetes Logs

With data flowing into Cribl, it just goes to devnull by default so we bring data in and dump it on the floor. In order to begin handling the data, I needed to create a pipeline for my data. Back on the Stream management page, I went to Pipelines located under the Processing menu item.

On the Pipelines page, I clicked the Add Pipeline button. I’m trying to keep things super simple here so I just called the pipeline kuberneteslogs and hit Save.

After hitting Save, I’m presenting with the options to Add Functions and Add Route. I’m not going to worry about adding in functions yet because I’m not sure what I want to do with this data yet. For now, I’m just going to create a route.

Creating a Route for Kubernetes Logs

In order to connect this to a route, I click the Attach Route button which displays a blank page because I have not created any routes. I click the Add Route button. From there, I accepted the defaults and gave it a name I can recognize.

I clicked Save again.

Committing Our Changes

Since I’m running in the cloud version, I need to do a Commit and Deploy whenever I make changes. With all of the changes I’ve made, I think it’s a good idea to do this. I also need to do this so the data source sends logs through my pipeline. In order to commit the changes, click the Commit & Deploy button in the upper right corner of the screen. You will be asked to enter some details about the commit and then you can proceed.

Verifying Our Changes

After I did the commit and deploy, I wanted to make sure I was getting logs into my pipeline. I opened the Monitoring menu and then from the Data drop down, I clicked on Routes to review a page similar to the below.

Based upon the graphs, it does look like I’m getting logs! I wanted to make sure that these were the logs I was looking for…I clicked the Live button on the IN heading to see:

This looks good. I’m getting the logs into the pipeline and everything is flowing like I’d expect. Now what?

Doing Something With Your Logs

Cribl does not specifically have a search function like Splunk or other SIEMs. Yes, I know it “can” be setup to search object stores like S3 but I’m not interested in storing my logs in S3 right now. I’m going to ship my logs back to my Splunk so that I can do searching and more.

Getting Your Cribl Egress IP Address(es)

In the event that you are a sometimes paranoid security dweeb, you too can lock down your infrastructure. Cribl has a list of Egress IP Addresses in your cloud deployment so that you can use these in your Ingress rules. When you first login to Cribl, there’s an Access Details button at the very top. When you click this, you will reveal a bunch of different information so I won’t show it but you will find your Egress IP Addresses listed.

Configure Cribl for My Splunk HEC

Now that I’ve got my Cribl Egress IP added to my Ingress rules for Splunk, I need to create a new destination in Cribl for my Splunk instance. In order to do this, I’ll go back to managing my Cribl Stream worker and open up the Destinations

You’ll notice that there are a bunch of different options for sending your logs some where. For my purposes, I’m going to open Splunk HEC.

There are no destinations listed so I will need to create one by clicking the Add Destination button.

The settings are pretty straight forward so I added my HEC endpoint and Token to the page. My Splunk server is running on a self signed certificate (I know I know I know) so I also had to the Advanced Settings and disable the Validate Server Certs. After setting this up, I clicked the Save button.

Creating A New Route to My Splunk

My plan is to use Cribl to only send logs of interest to my Splunk for future use. For starters, I’m going to start sending my web server logs by updating my KubernetesLogs Route. If you look at the Routes page, you’ll see that it already has a endRoute that dumps logs to devnull so our previous route is redundant. So that I only send logs of interest at my Splunk, I need to setup a filter in Cribl on this route as well. My filter looks something like the below:

sourcetype==="kube:container:nginx" && source.includes("wordpress") && !(_raw.includes("system-probe") || _raw.includes("Uptime Monitor"))

This filter is looking for any logs coming from my nginx containers that include wordpress. In addition to this inclusion, I’m looking to exclude any of the server probes that I have coming at my web servers. With this in place, I should be able to reduce some of the log noise coming from those monitors.

After adding the filter, I’m changing the destination from devnull:devnull to my splunk_hec:DigitalOceanSplunk. I also make sure Final is selected because I don’t want these logs flowing into future routes.

I did a Save and then another Commit And Deploy. Let’s see if we’re getting web server logs…Success!

Wrapping it All Up

I set out to configure the Cribl splunk_hec for kubernetes logs and I was successful. I’ve now got a flow of logs from my Kubernetes cluster into Cribl. In Cribl, I’m seeing all logs but I’m filtering out things that are not important to me. From there, I’m sending logs that I care about to my Splunk.

Thanks Danny!