In my previous From Reactive to Proactive: Transforming Security with Tetragon post, I got Tetragon installed and working. After doing some digging, I found that I was only generating events and logs for one of the k8 nodes in my cluster. This article focuses on how I noticed this and how to fix tetragon logging issues.
Identifying That I Had a Logging Issue
While poking around at the logs and events from Tetragon, I noticed that I wasn’t getting logs and events from all of my pods. As an example, I have the following two nginx pods on separate Kubernetes nodes:
% kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
nginx-0 2/2 Running 19 (4d2h ago) 27d 10.244.0.73 pool-cfch5xp3i-xonc2
nginx-1 2/2 Running 20 (4d2h ago) 27d 10.244.0.223 pool-cfch5xp3i-xoncp
If I run the command tetra getevents
command, I’m only seeing events from my nginx-1
pod:
% kubectl exec -ti -n kube-system ds/tetragon -c tetragon -- tetra getevents -o compact |grep 'default/nginx'
🚀 process default/nginx-1 /usr/bin/git rev-parse HEAD
💥 exit default/nginx-1 /usr/bin/git rev-parse HEAD 0
🔌 connect default/nginx-1 /usr/bin/ssh tcp 10.244.0.223:58882 -> 192.30.255.112:22
I do not see any events coming from my nginx-0
pod. I did a bunch of different things to try and force events but none showed up.
Identifying That I am an Idiot
I took a step back from what I was doing and then I realized that I was an idiot. I did not have a logging problem after all! The problem was actually in the tetragon
command that I was running. I was still working off of the examples provided in the documentation:
% kubectl exec -ti -n kube-system ds/tetragon -c tetragon -- tetra getevents
In case you missed it in the command, the examples are having you run ds/tetragon
. This is telling Kubernetes to run run tetra
command on one of the pods in the tetragon
daemonset. Now I feel like a complete moron.
Looking For Logs in All the Right Places
If I look this again, I can see that my nginx-0
pod is running on the pool-cfch5xp3i-xonc2
node. If I check the tetragon pods in my kube-system
, I can spot the one that is running on the same node:
% kubectl -n kube-system get pod -o wide|grep tetra
tetragon-2twpw 2/2 Running 0 28d 10.138.166.95 pool-cfch5xp3i-xoncp
tetragon-b2d66 2/2 Running 0 28d 10.138.90.146 pool-cfch5xp3i-xoncs
tetragon-operator-5f5489bfd9-4k4vd 1/1 Running 0 20m 10.244.0.170 pool-cfch5xp3i-xoncp
tetragon-wsgmg 2/2 Running 0 28d 10.138.102.22 pool-cfch5xp3i-xonc2
It looks like the tetragon-wsgmg
pod is running on the same node as my nginx-0
pod. Let’s try checking for events on THAT pod:
% kubectl exec -ti -n kube-system tetragon-wsgmg -c tetragon -- tetra getevents -o compact |grep 'default/nginx'
🚀 process default/nginx-0 /usr/bin/git rev-parse HEAD
💥 exit default/nginx-0 /usr/bin/git rev-parse HEAD 0
🔌 connect default/nginx-0 /usr/bin/ssh tcp 10.244.0.73:39880 -> 192.30.255.112:22
I see logs! What a great thing to see!
Now What, Idiot?
In the end, I do NOT have a logging problem. I have a problem with not looking at what I’m doing. The good news is that I do have to fix tetragon logging issues because there are no logging issues. Tetragon is working exactly as it should and I’m getting everything I need.
This does identify another problem that I have in my environment. All of the events are being generated locally within each node. The real problem I am facing is that I need to get the logs off my Kubernetes servers and into something that allows me to see them all centralized. I prefer to use Splunk as my go to logging platform so it looks like that’ll be my next steps.
Stay tuned for when I look at moving logs into Splunk from my Kubernetes. This should be fun 🙂