question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to launch multiple runs with Joblib

See original GitHub issue

Hi, thanks for the awesome library. I found it really helpful for my research!

However I have encountered some issues with trying to launch multiple runs with joblib.

from math import sqrt
from joblib import Parallel, delayed
import wandb

def f(x):
    wandb.init(project="symppl", reinit=True)
    for i in range(10):
        loss = i
        # Log metrics with wandb
        # wandb.log({"Loss": loss})
    wandb.finish()
    return sqrt(x)

def main():
    res = Parallel(n_jobs=2)(delayed(f)(i**2) for i in range(4))
    print(res)

if __name__ == "__main__":
    main()

This will error with the following exception.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/spawn.py", line 231, in prepare
    set_start_method(data['start_method'], force=True)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/context.py", line 247, in set_start_method
    self._actual_context = self.get_context(method)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/context.py", line 239, in get_context
    return super().get_context(method)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/context.py", line 193, in get_context
    raise ValueError('cannot find context for %r' % method) from None
ValueError: cannot find context for 'loky'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/spawn.py", line 231, in prepare
    set_start_method(data['start_method'], force=True)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/context.py", line 247, in set_start_method
    self._actual_context = self.get_context(method)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/context.py", line 239, in get_context
    return super().get_context(method)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/context.py", line 193, in get_context
    raise ValueError('cannot find context for %r' % method) from None
ValueError: cannot find context for 'loky'

Changing joblib’s backend to multiprocessing also causes an error.

Problem at: /Users/ethan/dev/wandb/run.py 6 f
Problem at: /Users/ethan/dev/wandb/run.py 6 f
Traceback (most recent call last):
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 575, in init
    run = wi.init()
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 367, in init
    backend.ensure_launched(
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/backend/backend.py", line 81, in ensure_launched
    self.wandb_process.start()
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/process.py", line 118, in start
    assert not _current_process._config.get('daemon'), \
AssertionError: daemonic processes are not allowed to have children
Traceback (most recent call last):
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 575, in init
    run = wi.init()
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 367, in init
    backend.ensure_launched(
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/backend/backend.py", line 81, in ensure_launched
    self.wandb_process.start()
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/process.py", line 118, in start
    assert not _current_process._config.get('daemon'), \
AssertionError: daemonic processes are not allowed to have children
wandb: ERROR Abnormal program exit
wandb: ERROR Abnormal program exit
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 575, in init
    run = wi.init()
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 367, in init
    backend.ensure_launched(
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/backend/backend.py", line 81, in ensure_launched
    self.wandb_process.start()
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/process.py", line 118, in start
    assert not _current_process._config.get('daemon'), \
AssertionError: daemonic processes are not allowed to have children

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/joblib/parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/joblib/parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "/Users/ethan/dev/wandb/run.py", line 6, in f
    wandb.init(project="symppl", reinit=True)
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 612, in init
    six.raise_from(Exception("problem"), error_seen)
  File "<string>", line 3, in raise_from
Exception: problem
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "run.py", line 19, in <module>
    main()
  File "run.py", line 15, in main
    res = Parallel(n_jobs=2, backend="multiprocessing")(delayed(f)(i**2) for i in range(4))
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/joblib/parallel.py", line 1042, in __call__
    self.retrieve()
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/site-packages/joblib/parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
Exception: problem
/Users/ethan/anaconda3/envs/symppl/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 12 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

My workflow consists of sweeping over hyper-parameters with hydra. This currently blocks me from using Hydra with a parallel launcher (e.g. joblib). I think this would also block support for hydra multiruns; see https://github.com/wandb/client/issues/1233#issuecomment-693139976

Is there a way that I can work around this issue?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:18 (7 by maintainers)

github_iconTop GitHub Comments

7reactions
vanpeltcommented, Mar 25, 2021

Hey @tiborsekera with the current release of the client library you can set the following environment variable so all works with joblib:

os.environ["WANDB_START_METHOD"] = "thread"

2reactions
raubitsjcommented, Nov 19, 2020

@ethanluoyc

Thanks for reporting this. As a temporary workaround until we fix this you can do:

pip install wandb==0.9.7

The only other change is that you will need in your sample script is to change wandb.finish() to wandb.join()

We are using a different approach in 0.10.x that appears to be conflicting with joblib’s multiprocessing implementations.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Unable to launch multiple runs with Joblib #1525 - GitHub
This will error with the following exception. Changing joblib's backend to multiprocessing also causes an error.
Read more >
Joblib doesn't run on multiple cores - Stack Overflow
I am observing CPU usage with htop and it appears that this is still running on only one single core. I searched for...
Read more >
Embarrassingly parallel for loops - Joblib - Read the Docs
Parallel provides a special handling for large arrays to automatically dump them on the filesystem and pass a reference to the worker to...
Read more >
joblib Documentation - Read the Docs
The default backend of joblib will run each function call in isolated Python processes, therefore they cannot mutate a common Python object ...
Read more >
Task using joblib fails to use multiple cores/tasks on SLURM
I have a rule which drives a script running a large batch of calculations parallelized by joblib under the hood. If I run...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found