Unable to launch multiple runs with Joblib
See original GitHub issueHi, thanks for the awesome library. I found it really helpful for my research!
However I have encountered some issues with trying to launch multiple runs with joblib.
from math import sqrt
from joblib import Parallel, delayed
import wandb
def f(x):
wandb.init(project="symppl", reinit=True)
for i in range(10):
loss = i
# Log metrics with wandb
# wandb.log({"Loss": loss})
wandb.finish()
return sqrt(x)
def main():
res = Parallel(n_jobs=2)(delayed(f)(i**2) for i in range(4))
print(res)
if __name__ == "__main__":
main()
This will error with the following exception.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/spawn.py", line 231, in prepare
set_start_method(data['start_method'], force=True)
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/context.py", line 247, in set_start_method
self._actual_context = self.get_context(method)
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/context.py", line 239, in get_context
return super().get_context(method)
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/context.py", line 193, in get_context
raise ValueError('cannot find context for %r' % method) from None
ValueError: cannot find context for 'loky'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/spawn.py", line 231, in prepare
set_start_method(data['start_method'], force=True)
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/context.py", line 247, in set_start_method
self._actual_context = self.get_context(method)
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/context.py", line 239, in get_context
return super().get_context(method)
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/context.py", line 193, in get_context
raise ValueError('cannot find context for %r' % method) from None
ValueError: cannot find context for 'loky'
Changing joblib’s backend to multiprocessing also causes an error.
Problem at: /Users/ethan/dev/wandb/run.py 6 f
Problem at: /Users/ethan/dev/wandb/run.py 6 f
Traceback (most recent call last):
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 575, in init
run = wi.init()
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 367, in init
backend.ensure_launched(
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/backend/backend.py", line 81, in ensure_launched
self.wandb_process.start()
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/process.py", line 118, in start
assert not _current_process._config.get('daemon'), \
AssertionError: daemonic processes are not allowed to have children
Traceback (most recent call last):
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 575, in init
run = wi.init()
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 367, in init
backend.ensure_launched(
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/backend/backend.py", line 81, in ensure_launched
self.wandb_process.start()
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/process.py", line 118, in start
assert not _current_process._config.get('daemon'), \
AssertionError: daemonic processes are not allowed to have children
wandb: ERROR Abnormal program exit
wandb: ERROR Abnormal program exit
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 575, in init
run = wi.init()
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 367, in init
backend.ensure_launched(
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/backend/backend.py", line 81, in ensure_launched
self.wandb_process.start()
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/process.py", line 118, in start
assert not _current_process._config.get('daemon'), \
AssertionError: daemonic processes are not allowed to have children
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 595, in __call__
return self.func(*args, **kwargs)
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/joblib/parallel.py", line 252, in __call__
return [func(*args, **kwargs)
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/joblib/parallel.py", line 252, in <listcomp>
return [func(*args, **kwargs)
File "/Users/ethan/dev/wandb/run.py", line 6, in f
wandb.init(project="symppl", reinit=True)
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 612, in init
six.raise_from(Exception("problem"), error_seen)
File "<string>", line 3, in raise_from
Exception: problem
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "run.py", line 19, in <module>
main()
File "run.py", line 15, in main
res = Parallel(n_jobs=2, backend="multiprocessing")(delayed(f)(i**2) for i in range(4))
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/joblib/parallel.py", line 1042, in __call__
self.retrieve()
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/joblib/parallel.py", line 921, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
Exception: problem
/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 12 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
My workflow consists of sweeping over hyper-parameters with hydra. This currently blocks me from using Hydra with a parallel launcher (e.g. joblib). I think this would also block support for hydra multiruns; see https://github.com/wandb/client/issues/1233#issuecomment-693139976
Is there a way that I can work around this issue?
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:18 (7 by maintainers)
Top Results From Across the Web
Unable to launch multiple runs with Joblib #1525 - GitHub
This will error with the following exception. Changing joblib's backend to multiprocessing also causes an error.
Read more >Joblib doesn't run on multiple cores - Stack Overflow
I am observing CPU usage with htop and it appears that this is still running on only one single core. I searched for...
Read more >Embarrassingly parallel for loops - Joblib - Read the Docs
Parallel provides a special handling for large arrays to automatically dump them on the filesystem and pass a reference to the worker to...
Read more >joblib Documentation - Read the Docs
The default backend of joblib will run each function call in isolated Python processes, therefore they cannot mutate a common Python object ...
Read more >Task using joblib fails to use multiple cores/tasks on SLURM
I have a rule which drives a script running a large batch of calculations parallelized by joblib under the hood. If I run...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hey @tiborsekera with the current release of the client library you can set the following environment variable so all works with joblib:
os.environ["WANDB_START_METHOD"] = "thread"
@ethanluoyc
Thanks for reporting this. As a temporary workaround until we fix this you can do:
The only other change is that you will need in your sample script is to change wandb.finish() to wandb.join()
We are using a different approach in 0.10.x that appears to be conflicting with joblib’s multiprocessing implementations.