question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Operator is leaking files in /tmp, running out of disk space

See original GitHub issue

Please use this to only for bug reports. For questions or when you need help, you can use the GitHub Discussions, our #strimzi Slack channel or out user mailing list.

Describe the bug

After running fine for severel weeks I noticed that the operator stopped processing new CRDs. When I checked the logs it printed something about /tmp and disk full. When I started a shell in the operator pod and checked /tmp/ it was full of files and directories, and free disk space of tmpfs was indeed at 0%.

I upgraded the operator to the recent version, which of course restarted it and cleared /tmp. Today I checked again, and new files already start to pile up, so it seems that bug is still there.

[strimzi@strimzi-cluster-operator-7845cc6994-8kwt7 strimzi]$ ls -al /tmp/
total 4
drwxrwxrwt 9 root    root  180 Jun  9 11:04 .
drwxr-xr-x 1 root    root 4096 Jun  8 22:04 ..
drwxr-xr-x 2 strimzi root   80 Jun  8 22:04 hsperfdata_strimzi
drwx------ 2 strimzi root   40 Jun  7 01:20 vertx-cache-10d589cd-960b-4b7d-acb8-004c370bbba6
drwx------ 2 strimzi root   40 Jun  3 07:47 vertx-cache-3c368a93-8142-4705-88e8-41e4e61fdfb9
drwx------ 2 strimzi root   40 Jun  1 11:03 vertx-cache-67e1066c-d66f-43e8-9920-b0748d8fa718
drwx------ 2 strimzi root   40 Jun  8 22:04 vertx-cache-8e23f4b7-a71d-4cdf-9e78-f3b401f9ba16
drwx------ 2 strimzi root   40 May 30 08:07 vertx-cache-9316e7c6-33d3-4152-afcd-d0af00aaf7cc
drwx------ 2 strimzi root   40 Jun  5 04:34 vertx-cache-cf4d0cfc-cde3-4178-9d84-7936be517402
[strimzi@strimzi-cluster-operator-7845cc6994-8kwt7 strimzi]$ df /tmp/
Filesystem     1K-blocks  Used Available Use% Mounted on
tmpfs               1024   160       864  16% /tmp

To Reproduce Steps to reproduce the behavior:

Install operator. Let it run for several weeks or months without restarting the pod.

Expected behavior

Files in /tmp should be deleted when no longer used. /tmp/ should never run out of disk space.

Environment (please complete the following information):

  • Strimzi version: 0.29.0
  • Installation method: [Helm chart 0.29.0 from https://strimzi.io/charts]
  • Kubernetes cluster: [k8s 1.22]
  • Infrastructure: [on premise]

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
scholzjcommented, Jun 20, 2022

@elluvium Normally, the /tmp directory should have one of the vertx-cache-* directories. But when the container restarts inside the Pod, it normally keeps the same storage. But Vert.x creates a new cache directory. So with container restarts, new and new cache directories will be created. Their content should be small, but at the end it is only question of time until they use all the space. Only when you delete the pod, the /tmp storage is deleted as well and the container starts with a clean slate. So the problem has to possible solutions:

  • Stop the container from restarting => this is the ideal solution, but not always easy and might have many different causes (although the primary issue here seems to be OoM, so it is fairly clear in this case - but there might be other cases as well). When it doesn’t restart, it will keep using the same cache directory and should not run out of disk space.
  • Clean the storage when it restarts => the container does not need to keep anything in /tmp from the previous run, so by deleting the cache directories at startup if any exist, you can avoid running out of disk space. This is obviously not as perfect as not restarting it. But it should be easy to do and should work for all kind of different situations.
0reactions
scholzjcommented, Jun 20, 2022

@elluvium AFAIK, the containers might get restarted individually if you have more of them inside the Pod. What I’m trying to distinguish is the restart when a container exists and is started again (which I guess you could call Pod restart as well) and the old Pod being deleted and new Pod being created by the Deployment / Replica Set.

Basically, if you do something like kubectl get pods -o wide, you should see something like this:

NAMESPACE        NAME                                          READY   STATUS    RESTARTS   AGE   IP              NODE                  NOMINATED NODE   READINESS GATES
infra-namespace   strimzi-cluster-operator-5d74667679-b59r9     1/1     Running             0          60s   172.16.94.191   192.168.1.72.xip.io   <none>           <none>

And the RESTARTS column shows the restarts of the containers (or of the Pod if you want).

Read more comments on GitHub >

github_iconTop Results From Across the Web

How To Prevent A System Hang By Writing Files To /tmp
Writing files to /tmp will not result in using disk space on swap unless the system is running low on memory as to...
Read more >
Why is docker image eating up my disk space that is not used ...
Deleted containers don't free up mapped disk space. This means that on the affected OSs you'll slowly run out of space as you...
Read more >
Free Disk Space - an overview | ScienceDirect Topics
Servers have a finite amount of disk space when they are built. From that day onward, the amount of free space tends to...
Read more >
Fix 100% Disk Usage in Windows 10 Task Manager - Avast
In the Run window that opens, type temp and press OK. Run form with "temp" typed in. Select all the files in the...
Read more >
How to prevent data leak from /tmp without FDE, ramfs, or tmpfs?
Do you know each source code line off all parts (kernel, apps, etc) that will be run on the PC? If answer is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found