question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Misleading ImportError when using Parallel inside a "with Parallel(...) as" block (backend='multiprocessing')

See original GitHub issue
from math import sqrt
from joblib import Parallel, delayed

input_list = [x**2 for x in range(10)]


def main():
    with Parallel(n_jobs=3, backend='multiprocessing') as parallel:
        output = Parallel(n_jobs=2, backend='multiprocessing')(delayed(sqrt)(i) for i in input_list)
        return output

if __name__ == '__main__':
    print(main())
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
/tmp/test_joblib_reload.py in <module>()
     12 
     13 if __name__ == '__main__':
---> 14     print(main())

/tmp/test_joblib_reload.py in main()
      8     with Parallel(n_jobs=2) as parallel:
--->  9         output = Parallel(n_jobs=2)(delayed(sqrt)(i) for i in input_list)
     10         return output
     11 

/home/lesteve/miniconda3/lib/python3.5/site-packages/joblib/parallel.py in __call__(self, iterable)
    764         self._aborting = False
    765         if not self._managed_pool:
--> 766             n_jobs = self._initialize_pool()
    767         else:
    768             n_jobs = self._effective_n_jobs()

/home/lesteve/miniconda3/lib/python3.5/site-packages/joblib/parallel.py in _initialize_pool(self)
    513                 already_forked = int(os.environ.get(JOBLIB_SPAWNED_PROCESS, 0))
    514                 if already_forked:
--> 515                     raise ImportError('[joblib] Attempting to do parallel computing '
    516                             'without protecting your import on a system that does '
    517                             'not support forking. To use parallel-computing in a '

ImportError: [joblib] Attempting to do parallel computing without protecting your import on a system that does not support forking. To use parallel-computing in a script, you must protect your main loop using "if __name__ == '__main__'". Please see the joblib documentation on Parallel for more information

It is misleading because 1) I am on Linux so my system supports forking 2) I am using a ifmain guard. Not sure what we should do here and if there is an easy way to detect this.

For completeness, I originally saw the error in https://github.com/scikit-learn/scikit-learn/issues/6258 and only found time to trace it back recently.

Issue Analytics

  • State:open
  • Created 7 years ago
  • Comments:11 (6 by maintainers)

github_iconTop GitHub Comments

5reactions
essandesscommented, Apr 5, 2018

I see this issue on macOS in a Jupyter notebook working with scikit-learn and multicore processing. This MWE tickles the issue:

import numpy as np, numpy.random as npr
from sklearn import cluster

data = npr.poisson(1,(100,10))
algorithm = cluster.KMeans
algorithm_kwargs = dict(n_clusters=4,n_jobs=-1)
estimator = algorithm(**algorithm_kwargs)
labels = estimator.fit_predict(data)

The code runs as expected when run as a Python script.

However, under a Jupyter notebook, it throws this error:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-64-5a0071cf17dc> in <module>()
      3 algorithm_kwargs = dict(n_clusters=4,n_jobs=-1)
      4 estimator = algorithm(**algorithm_kwargs)
----> 5 labels = estimator.fit_predict(data)

/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/cluster/k_means_.py in fit_predict(self, X, y)
    915             Index of the cluster each sample belongs to.
    916         """
--> 917         return self.fit(X).labels_
    918 
    919     def fit_transform(self, X, y=None):

/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/cluster/k_means_.py in fit(self, X, y)
    894                 tol=self.tol, random_state=random_state, copy_x=self.copy_x,
    895                 n_jobs=self.n_jobs, algorithm=self.algorithm,
--> 896                 return_n_iter=True)
    897         return self
    898 

/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/cluster/k_means_.py in k_means(X, n_clusters, init, precompute_distances, n_init, max_iter, verbose, tol, random_state, copy_x, n_jobs, algorithm, return_n_iter)
    361                                    # Change seed to ensure variety
    362                                    random_state=seed)
--> 363             for seed in seeds)
    364         # Get results with the lowest inertia
    365         labels, inertia, centers, n_iters = zip(*results)

/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
    747         self._aborting = False
    748         if not self._managed_backend:
--> 749             n_jobs = self._initialize_backend()
    750         else:
    751             n_jobs = self._effective_n_jobs()

/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in _initialize_backend(self)
    545         try:
    546             n_jobs = self._backend.configure(n_jobs=self.n_jobs, parallel=self,
--> 547                                              **self._backend_args)
    548             if self.timeout is not None and not self._backend.supports_timeout:
    549                 warnings.warn(

/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py in configure(self, n_jobs, parallel, **backend_args)
    303         if already_forked:
    304             raise ImportError(
--> 305                 '[joblib] Attempting to do parallel computing '
    306                 'without protecting your import on a system that does '
    307                 'not support forking. To use parallel-computing in a '

ImportError: [joblib] Attempting to do parallel computing without protecting your import on a system that does not support forking. To use parallel-computing in a script, you must protect your main loop using "if __name__ == '__main__'". Please see the joblib documentation on Parallel for more information

I’m not sure if this is an issue with joblib or some downstream issue with Jupyter, but it is a problem.

The very same issue arises in related libraries, such as HDBSCAN. The following alternate code tickles the issue with HDBSCAN:

import hdbscan

algorithm = hdbscan.HDBSCAN
algorithm_kwargs = dict(min_cluster_size=10,allow_single_cluster=True)
2reactions
bkellermancommented, Jun 20, 2018

I can reproduce this error with this line in Jupyter on MacOS

cross_val_score(clf, X, y, cv=3, scoring='average_precision', n_jobs=-1, pre_dispatch=1)

Changing n_jobs to 1 fixes it.

sklearn 0.19.0 with joblib 0.11 jupyter 5.1.0 python 3.5.1 MacOS 10.13.4

Edit:

Restarting the Jupyter notebook fixes it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

joblib/test_parallel.py at master - GitHub
with parallel_backend("threading", n_jobs=backend_n_jobs):. # when using a backend, the default of number jobs will be the one set. # in the backend.
Read more >
Embarrassingly parallel for loops - Joblib - Read the Docs
Joblib provides a simple helper class to write parallel for loops using multiprocessing. The core idea is to write the code to be...
Read more >
Error pickling a `matlab` object in joblib `Parallel` context
The error is caused by incorrect loading order of global objects in the child processes. It can be seen clearly in the traceback ......
Read more >
parallel.py
when used in conjunction `Parallel(backend='threading')`. """ # Try to pickle the input function, to catch the problems early when # using with ...
Read more >
joblib Documentation - Read the Docs
Note that here max_nbytes=None is used to disable the auto-dumping feature of Parallel. small_array is still in shared memory in the worker ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found