question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

wandb is leaking file pointers?

See original GitHub issue

In my code, I am creating and closing wandb loggers to different projects (I have multiple training pipelines running at the same time). I noticed that wandb seems to be leaking open files. This is causing problems with my code because at a certain point there are too many open files so the job is killed.

Here is a script which replicates this problem:

import psutil
import wandb

def print_file_info():
    proc = psutil.Process()
    print('Num open files: %s' % len(proc.open_files()))
    for filename in proc.open_files():
        print('\t%s' % filename.path)

print('Before any WANDB stuff')
print_file_info()
run = wandb.init(project='test', name='test', reinit=True, resume='allow')
run_id = run.id
run.finish()

print('After creating run and getting run ID')
print_file_info()
run = wandb.init(id=run_id, project='test', name='test', reinit=True, resume='allow')
run.finish()

print('After accessing run again')
print_file_info()

test_file = open('test_file.txt', 'w')
print('After creating a normal file pointer')
print_file_info()

test_file.close()
print('After closing that file')
print_file_info()

If I run with WANDB_SILENT=true python wandb_file_leak_test.py, the output will be something like:

Before any WANDB stuff
Num open files: 0


After creating run and getting run ID
Num open files: 1
	/home/alsuhr/Documents/testing/wandb/run-20201031_132102-j0ebkg08/logs/debug.log


After accessing run again
Num open files: 2
	/home/alsuhr/Documents/testing/wandb/run-20201031_132102-j0ebkg08/logs/debug.log
	/home/alsuhr/Documents/testing/wandb/run-20201031_132106-j0ebkg08/logs/debug.log
After creating a normal file pointer
Num open files: 3
	/home/alsuhr/Documents/testing/wandb/run-20201031_132102-j0ebkg08/logs/debug.log
	/home/alsuhr/Documents/testing/wandb/run-20201031_132106-j0ebkg08/logs/debug.log
	/home/alsuhr/Documents/testing/test_file.txt
After closing that file
Num open files: 2
	/home/alsuhr/Documents/testing/wandb/run-20201031_132102-j0ebkg08/logs/debug.log
	/home/alsuhr/Documents/testing/wandb/run-20201031_132106-j0ebkg08/logs/debug.log

Notice how the test file is opened and then the file pointer to it is gone in the last check. What is going on? How can I make sure that these file pointers are actually closed by wandb?

Thanks!

Forgot to mention:

  • wandb version 0.10.8
  • python version 3.7.6
  • uname -a: Linux bigbox 4.15.0-112-generic #113~16.04.1-Ubuntu SMP Fri Jul 10 04:37:08 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

More details:

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:3
  • Comments:22 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
issue-label-bot[bot]commented, Oct 31, 2020

Issue-Label Bot is automatically applying the label bug to this issue, with a confidence of 0.62. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

1reaction
Exr0ncommented, Jul 22, 2022

EDIT: the issue didn’t get reopened, so I opened a new one: #3974

Comment to repoen: running into the same issues (both too many files and leaked semaphores) on wandb=0.12.21, Python 3.9.13, MacOS Big Sur 11.6.

Code to reproduce:

import wandb
for i in range(100):
    with wandb.init(entity='exr0nprojects', project='snap', group='useless'):
        print(f"run number {i}")

The “leaked semaphores” appears during the first run, but this may be happening because I’ve been testing in the same shell. The full error:

/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 54 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

I got the “too many open files” error after the 18th run. The error message is quite long, but appears to just repeat the following segment:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/wandb/sdk/wandb_init.py", line 1043, in init
    run = wi.init()
  File "/usr/local/lib/python3.9/site-packages/wandb/sdk/wandb_init.py", line 556, in init
    backend.ensure_launched()
  File "/usr/local/lib/python3.9/site-packages/wandb/sdk/backend/backend.py", line 220, in ensure_launched
    self.wandb_process.start()
  File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 58, in _launch
    self.pid = util.spawnv_passfds(spawn.get_executable(),
  File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/util.py", line 450, in spawnv_passfds
    errpipe_read, errpipe_write = os.pipe()
OSError: [Errno 24] Too many open files

The above exception was the direct cause of the following exception: [etc]
Read more comments on GitHub >

github_iconTop Results From Across the Web

wandb is leaking file pointers? · Issue #1447 - GitHub
I noticed that wandb seems to be leaking open files. This is causing problems with my code because at a certain point there...
Read more >
Troubleshooting - Documentation - Weights & Biases - Wandb
If your network is flaky, run training in offline mode and sync the files to us from a machine that has Internet access....
Read more >
vocab.txt · Kayvane/distilbert-base-uncased-wandb-week-3 ...
We're on a journey to advance and democratize artificial intelligence through open source and open science.
Read more >
What's new — Covasim 3.1.4 documentation - IDM docs
All notable changes to the codebase are documented in this file. ... used a lot of memory; these “memory leaks” have been fixed...
Read more >
The Garden of Forking Paths | Kaggle
I forked a few kernels, but could not find the ④ file page, how can I list all kernels I forked ? When...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found