question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[CLI]: Opening runs leaks file pointers and semaphores

See original GitHub issue

Describe the bug

Reopening #1447 which was closed in a broad issue cleanup.

Initializing wandb runs appears to leak file pointers, as after opening a few runs, the process crashes with “too many open files.” Quitting the process with Control-C leads to the common leaked semaphore message (and thus may be unrelated?)

My use case: I intend to sequentially launch a group of runs for K-folds cross-validation. Each group must be launched from the same Python file because that file is my sweeps entry point. However, I can’t get through all the folds I need without crashing on “too many open files.”

I’m on wandb=0.12.21, Python 3.9.13, MacOS Big Sur 11.6.

Code to reproduce:

import wandb
for i in range(100):
    with wandb.init(entity='exr0nprojects', project='snap', group='useless'):
        print(f"run number {i}")

The “leaked semaphores” appears during the first run, but this may be happening because I’ve been testing in the same shell. The full error:

/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 54 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

I got the “too many open files” error after the 18th run. The error message is quite long, but appears to just repeat the following segment:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/wandb/sdk/wandb_init.py", line 1043, in init
    run = wi.init()
  File "/usr/local/lib/python3.9/site-packages/wandb/sdk/wandb_init.py", line 556, in init
    backend.ensure_launched()
  File "/usr/local/lib/python3.9/site-packages/wandb/sdk/backend/backend.py", line 220, in ensure_launched
    self.wandb_process.start()
  File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 58, in _launch
    self.pid = util.spawnv_passfds(spawn.get_executable(),
  File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/util.py", line 450, in spawnv_passfds
    errpipe_read, errpipe_write = os.pipe()
OSError: [Errno 24] Too many open files

The above exception was the direct cause of the following exception: [etc]

Additional Files

No response

Environment

WandB version: 0.12.21

OS: MacOS Big Sur 11.6

Python version: 3.9.13

Versions of relevant libraries: No other libraries are needed for reproduction.

Additional Context

No response

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
nate-wandbcommented, Aug 9, 2022

Hi @Exr0n, sorry for the delay here. I was finally able to replicate this after setting my default number of files allowed to be opened with ulimit -n 100. My defaults were set to 1048575 originally which was why I could not replicate. I also can confirm what you are seeing that it appears that these files are being held by an external process that wandb spins up. I’ll report this to our engineering team.

In the meantime you should be able to set your limit higher with ulimit -n <number_of_file_handles_you_need> to get you unblocked for the time being.

I’ll follow up once we are able to come up with a fix on this and figure out what the issue is.

Thank you, Nate

0reactions
owlesiacommented, Sep 18, 2022

I also encountered the error “Too many open files” on mac. Could you let me know if there are any updates? besides changing the limit of open files

Read more comments on GitHub >

github_iconTop Results From Across the Web

Lec32 Programming with POSIX Semaphores (Arif Butt ...
This session starts with introduction to POSIX semaphores by giving a comparison between mutex, condition variable and semaphore.
Read more >
- Semaphores
Semaphores are a common form of synchronization that allows threads to “post” and “wait” on a semaphore to control when threads wake or ......
Read more >
Debug Tutorial Part 5: Handle Leaks - CodeProject
Learn how to debug handle leaks in Windows. ... If the same process opens the file, generally they get two handles but one...
Read more >
Does re-use of file pointers cause a memory leak?
!= NULL) in the code somewhere when you actually use the pointer for read/write. also, it might be that the file open failed...
Read more >
How the Linux kernel runs a program - 0xax
If you will look on the source code of the bash shell, you will find the main function in the shell.c source code...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found