Operator is leaking files in /tmp, running out of disk space
See original GitHub issuePlease use this to only for bug reports. For questions or when you need help, you can use the GitHub Discussions, our #strimzi Slack channel or out user mailing list.
Describe the bug
After running fine for severel weeks I noticed that the operator stopped processing new CRDs. When I checked the logs it printed something about /tmp and disk full. When I started a shell in the operator pod and checked /tmp/ it was full of files and directories, and free disk space of tmpfs was indeed at 0%.
I upgraded the operator to the recent version, which of course restarted it and cleared /tmp. Today I checked again, and new files already start to pile up, so it seems that bug is still there.
[strimzi@strimzi-cluster-operator-7845cc6994-8kwt7 strimzi]$ ls -al /tmp/
total 4
drwxrwxrwt 9 root root 180 Jun 9 11:04 .
drwxr-xr-x 1 root root 4096 Jun 8 22:04 ..
drwxr-xr-x 2 strimzi root 80 Jun 8 22:04 hsperfdata_strimzi
drwx------ 2 strimzi root 40 Jun 7 01:20 vertx-cache-10d589cd-960b-4b7d-acb8-004c370bbba6
drwx------ 2 strimzi root 40 Jun 3 07:47 vertx-cache-3c368a93-8142-4705-88e8-41e4e61fdfb9
drwx------ 2 strimzi root 40 Jun 1 11:03 vertx-cache-67e1066c-d66f-43e8-9920-b0748d8fa718
drwx------ 2 strimzi root 40 Jun 8 22:04 vertx-cache-8e23f4b7-a71d-4cdf-9e78-f3b401f9ba16
drwx------ 2 strimzi root 40 May 30 08:07 vertx-cache-9316e7c6-33d3-4152-afcd-d0af00aaf7cc
drwx------ 2 strimzi root 40 Jun 5 04:34 vertx-cache-cf4d0cfc-cde3-4178-9d84-7936be517402
[strimzi@strimzi-cluster-operator-7845cc6994-8kwt7 strimzi]$ df /tmp/
Filesystem 1K-blocks Used Available Use% Mounted on
tmpfs 1024 160 864 16% /tmp
To Reproduce Steps to reproduce the behavior:
Install operator. Let it run for several weeks or months without restarting the pod.
Expected behavior
Files in /tmp should be deleted when no longer used. /tmp/ should never run out of disk space.
Environment (please complete the following information):
- Strimzi version: 0.29.0
- Installation method: [Helm chart 0.29.0 from https://strimzi.io/charts]
- Kubernetes cluster: [k8s 1.22]
- Infrastructure: [on premise]
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:10 (5 by maintainers)
Top GitHub Comments
@elluvium Normally, the
/tmp
directory should have one of thevertx-cache-*
directories. But when the container restarts inside the Pod, it normally keeps the same storage. But Vert.x creates a new cache directory. So with container restarts, new and new cache directories will be created. Their content should be small, but at the end it is only question of time until they use all the space. Only when you delete the pod, the/tmp
storage is deleted as well and the container starts with a clean slate. So the problem has to possible solutions:/tmp
from the previous run, so by deleting the cache directories at startup if any exist, you can avoid running out of disk space. This is obviously not as perfect as not restarting it. But it should be easy to do and should work for all kind of different situations.@elluvium AFAIK, the containers might get restarted individually if you have more of them inside the Pod. What I’m trying to distinguish is the restart when a container exists and is started again (which I guess you could call Pod restart as well) and the old Pod being deleted and new Pod being created by the Deployment / Replica Set.
Basically, if you do something like
kubectl get pods -o wide
, you should see something like this:And the
RESTARTS
column shows the restarts of the containers (or of the Pod if you want).