wandb is leaking file pointers?
See original GitHub issueIn my code, I am creating and closing wandb loggers to different projects (I have multiple training pipelines running at the same time). I noticed that wandb seems to be leaking open files. This is causing problems with my code because at a certain point there are too many open files so the job is killed.
Here is a script which replicates this problem:
import psutil
import wandb
def print_file_info():
proc = psutil.Process()
print('Num open files: %s' % len(proc.open_files()))
for filename in proc.open_files():
print('\t%s' % filename.path)
print('Before any WANDB stuff')
print_file_info()
run = wandb.init(project='test', name='test', reinit=True, resume='allow')
run_id = run.id
run.finish()
print('After creating run and getting run ID')
print_file_info()
run = wandb.init(id=run_id, project='test', name='test', reinit=True, resume='allow')
run.finish()
print('After accessing run again')
print_file_info()
test_file = open('test_file.txt', 'w')
print('After creating a normal file pointer')
print_file_info()
test_file.close()
print('After closing that file')
print_file_info()
If I run with WANDB_SILENT=true python wandb_file_leak_test.py
, the output will be something like:
Before any WANDB stuff
Num open files: 0
After creating run and getting run ID
Num open files: 1
/home/alsuhr/Documents/testing/wandb/run-20201031_132102-j0ebkg08/logs/debug.log
After accessing run again
Num open files: 2
/home/alsuhr/Documents/testing/wandb/run-20201031_132102-j0ebkg08/logs/debug.log
/home/alsuhr/Documents/testing/wandb/run-20201031_132106-j0ebkg08/logs/debug.log
After creating a normal file pointer
Num open files: 3
/home/alsuhr/Documents/testing/wandb/run-20201031_132102-j0ebkg08/logs/debug.log
/home/alsuhr/Documents/testing/wandb/run-20201031_132106-j0ebkg08/logs/debug.log
/home/alsuhr/Documents/testing/test_file.txt
After closing that file
Num open files: 2
/home/alsuhr/Documents/testing/wandb/run-20201031_132102-j0ebkg08/logs/debug.log
/home/alsuhr/Documents/testing/wandb/run-20201031_132106-j0ebkg08/logs/debug.log
Notice how the test file is opened and then the file pointer to it is gone in the last check. What is going on? How can I make sure that these file pointers are actually closed by wandb?
Thanks!
Forgot to mention:
- wandb version 0.10.8
- python version 3.7.6
uname -a
: Linux bigbox 4.15.0-112-generic #113~16.04.1-Ubuntu SMP Fri Jul 10 04:37:08 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
More details:
- I can replicate this even if I remove
reinit=True
andresume='allow'
. - The documentation seems to suggest this file is only being written to if
WANDB_SILENT=true
(https://docs.wandb.com/library/environment-variables#optional-environment-variables). However, I can replicate this even I remove that environment variable (just set it above so it’s easier to read the output).
Issue Analytics
- State:
- Created 3 years ago
- Reactions:3
- Comments:22 (5 by maintainers)
Top Results From Across the Web
wandb is leaking file pointers? · Issue #1447 - GitHub
I noticed that wandb seems to be leaking open files. This is causing problems with my code because at a certain point there...
Read more >Troubleshooting - Documentation - Weights & Biases - Wandb
If your network is flaky, run training in offline mode and sync the files to us from a machine that has Internet access....
Read more >vocab.txt · Kayvane/distilbert-base-uncased-wandb-week-3 ...
We're on a journey to advance and democratize artificial intelligence through open source and open science.
Read more >What's new — Covasim 3.1.4 documentation - IDM docs
All notable changes to the codebase are documented in this file. ... used a lot of memory; these “memory leaks” have been fixed...
Read more >The Garden of Forking Paths | Kaggle
I forked a few kernels, but could not find the ④ file page, how can I list all kernels I forked ? When...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Issue-Label Bot is automatically applying the label
bug
to this issue, with a confidence of 0.62. Please mark this comment with 👍 or 👎 to give our bot feedback!Links: app homepage, dashboard and code for this bot.
EDIT: the issue didn’t get reopened, so I opened a new one: #3974
Comment to repoen: running into the same issues (both too many files and leaked semaphores) on
wandb=0.12.21
, Python 3.9.13, MacOS Big Sur 11.6.Code to reproduce:
The “leaked semaphores” appears during the first run, but this may be happening because I’ve been testing in the same shell. The full error:
I got the “too many open files” error after the 18th run. The error message is quite long, but appears to just repeat the following segment: