In my previous From Reactive to Proactive: Transforming Security with Tetragon post, I got Tetragon installed and working. After doing some digging, I found that I was only generating events and logs for one of the k8 nodes in my cluster. This article focuses on how I noticed this and how to fix tetragon logging issues.
Identifying That I Had a Logging Issue
While poking around at the logs and events from Tetragon, I noticed that I wasn’t getting logs and events from all of my pods. As an example, I have the following two nginx pods on separate Kubernetes nodes:
% kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
nginx-0 2/2 Running 19 (4d2h ago) 27d blog.shellnetsecurity.com pool-cfch5xp3i-xonc2
nginx-1 2/2 Running 20 (4d2h ago) 27d blog.shellnetsecurity.com pool-cfch5xp3i-xoncp
If I run the command
tetra getevents command, I’m only seeing events from my
% kubectl exec -ti -n kube-system ds/tetragon -c tetragon -- tetra getevents -o compact |grep 'default/nginx'
🚀 process default/nginx-1 /usr/bin/git rev-parse HEAD
💥 exit default/nginx-1 /usr/bin/git rev-parse HEAD 0
🔌 connect default/nginx-1 /usr/bin/ssh tcp blog.shellnetsecurity.com:58882 -> 184.108.40.206:22
I do not see any events coming from my
nginx-0 pod. I did a bunch of different things to try and force events but none showed up.
Identifying That I am an Idiot
I took a step back from what I was doing and then I realized that I was an idiot. I did not have a logging problem after all! The problem was actually in the
tetragon command that I was running. I was still working off of the examples provided in the documentation:
% kubectl exec -ti -n kube-system ds/tetragon -c tetragon -- tetra getevents
In case you missed it in the command, the examples are having you run
ds/tetragon. This is telling Kubernetes to run run
tetra command on one of the pods in the
tetragon daemonset. Now I feel like a complete moron.
Looking For Logs in All the Right Places
If I look this again, I can see that my
nginx-0 pod is running on the
pool-cfch5xp3i-xonc2 node. If I check the tetragon pods in my
kube-system, I can spot the one that is running on the same node:
% kubectl -n kube-system get pod -o wide|grep tetra
tetragon-2twpw 2/2 Running 0 28d blog.shellnetsecurity.com pool-cfch5xp3i-xoncp
tetragon-b2d66 2/2 Running 0 28d blog.shellnetsecurity.com pool-cfch5xp3i-xoncs
tetragon-operator-5f5489bfd9-4k4vd 1/1 Running 0 20m blog.shellnetsecurity.com pool-cfch5xp3i-xoncp
tetragon-wsgmg 2/2 Running 0 28d blog.shellnetsecurity.com pool-cfch5xp3i-xonc2
It looks like the
tetragon-wsgmg pod is running on the same node as my
nginx-0 pod. Let’s try checking for events on THAT pod:
% kubectl exec -ti -n kube-system tetragon-wsgmg -c tetragon -- tetra getevents -o compact |grep 'default/nginx'
🚀 process default/nginx-0 /usr/bin/git rev-parse HEAD
💥 exit default/nginx-0 /usr/bin/git rev-parse HEAD 0
🔌 connect default/nginx-0 /usr/bin/ssh tcp blog.shellnetsecurity.com:39880 -> 220.127.116.11:22
I see logs! What a great thing to see!
Now What, Idiot?
In the end, I do NOT have a logging problem. I have a problem with not looking at what I’m doing. The good news is that I do have to fix tetragon logging issues because there are no logging issues. Tetragon is working exactly as it should and I’m getting everything I need.
This does identify another problem that I have in my environment. All of the events are being generated locally within each node. The real problem I am facing is that I need to get the logs off my Kubernetes servers and into something that allows me to see them all centralized. I prefer to use Splunk as my go to logging platform so it looks like that’ll be my next steps.
Stay tuned for when I look at moving logs into Splunk from my Kubernetes. This should be fun 🙂