Silence Not Golden: Fixing the Mute Button on Tetragon Logs

In my previous From Reactive to Proactive: Transforming Security with Tetragon post, I got Tetragon installed and working. After doing some digging, I found that I was only generating events and logs for one of the k8 nodes in my cluster. This article focuses on how I noticed this and how to fix tetragon logging issues.

Identifying That I Had a Logging Issue

While poking around at the logs and events from Tetragon, I noticed that I wasn’t getting logs and events from all of my pods. As an example, I have the following two nginx pods on separate Kubernetes nodes:

 % kubectl get pod -o wide
NAME                            READY   STATUS    RESTARTS        AGE     IP             NODE                   
nginx-0                         2/2     Running   19 (4d2h ago)   27d    pool-cfch5xp3i-xonc2 
nginx-1                         2/2     Running   20 (4d2h ago)   27d   pool-cfch5xp3i-xoncp

If I run the command tetra getevents command, I’m only seeing events from my nginx-1 pod:

% kubectl exec -ti -n kube-system ds/tetragon -c tetragon -- tetra getevents -o compact |grep 'default/nginx'           
๐Ÿš€ process default/nginx-1 /usr/bin/git rev-parse HEAD                    
๐Ÿ’ฅ exit    default/nginx-1 /usr/bin/git rev-parse HEAD 0         
๐Ÿ”Œ connect default/nginx-1 /usr/bin/ssh tcp -> 

I do not see any events coming from my nginx-0 pod. I did a bunch of different things to try and force events but none showed up.

Identifying That I am an Idiot

I took a step back from what I was doing and then I realized that I was an idiot. I did not have a logging problem after all! The problem was actually in the tetragon command that I was running. I was still working off of the examples provided in the documentation:

% kubectl exec -ti -n kube-system ds/tetragon -c tetragon -- tetra getevents

In case you missed it in the command, the examples are having you run ds/tetragon. This is telling Kubernetes to run run tetra command on one of the pods in the tetragon daemonset. Now I feel like a complete moron.

Looking For Logs in All the Right Places

If I look this again, I can see that my nginx-0 pod is running on the pool-cfch5xp3i-xonc2 node. If I check the tetragon pods in my kube-system, I can spot the one that is running on the same node:

% kubectl -n kube-system get pod -o wide|grep tetra 
tetragon-2twpw                       2/2     Running   0              28d   pool-cfch5xp3i-xoncp
tetragon-b2d66                       2/2     Running   0              28d   pool-cfch5xp3i-xoncs
tetragon-operator-5f5489bfd9-4k4vd   1/1     Running   0              20m    pool-cfch5xp3i-xoncp
tetragon-wsgmg                       2/2     Running   0              28d   pool-cfch5xp3i-xonc2 

It looks like the tetragon-wsgmg pod is running on the same node as my nginx-0 pod. Let’s try checking for events on THAT pod:

% kubectl exec -ti -n kube-system tetragon-wsgmg -c tetragon -- tetra getevents -o compact |grep 'default/nginx'
๐Ÿš€ process default/nginx-0 /usr/bin/git rev-parse HEAD                    
๐Ÿ’ฅ exit    default/nginx-0 /usr/bin/git rev-parse HEAD 0         
๐Ÿ”Œ connect default/nginx-0 /usr/bin/ssh tcp -> 

I see logs! What a great thing to see!

Now What, Idiot?

In the end, I do NOT have a logging problem. I have a problem with not looking at what I’m doing. The good news is that I do have to fix tetragon logging issues because there are no logging issues. Tetragon is working exactly as it should and I’m getting everything I need.

This does identify another problem that I have in my environment. All of the events are being generated locally within each node. The real problem I am facing is that I need to get the logs off my Kubernetes servers and into something that allows me to see them all centralized. I prefer to use Splunk as my go to logging platform so it looks like that’ll be my next steps.

Stay tuned for when I look at moving logs into Splunk from my Kubernetes. This should be fun ๐Ÿ™‚