question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Joblib with Loky multiprocessing backend unable to use in frozen executable

See original GitHub issue

This issue has bugged me for a week now, I tried to isolate it as good as possible and finally created this GitHub repo as a minimal reproducible example.


Short read: when joblib is used in a dependency (eg. in my example, hdbscan), multiprocessing doesn’t work in frozen Python executables (cx_freeze) on Windows - neither with threaded execution, nor with Queue or with multiprocessing.Pool. Standard pool execution results in OSError: [WinError 87] The parameter is incorrect, Queue and ThreadPool lead to a multiprocessing bomb, using the existing mp.Pool with multiprocessing.get_context("spawn").Pool() doesn’t crash but is unusable slow.

See complete logs for the approaches above.


Considerations:

  • all of this only happens in the frozen package, not when used regularly with Python interpreter (tested in WSL Ubuntu 18.04 and Windows 10).
  • this didn’t happen with the old multiprocessing backend from joblib (back then, HDBSCAN was using joblib from sklearn.externals.joblib), only with the new Loky backend. I’ve moved from python 3.6 to 3.7 and can’t go back to old joblib/sklearn versions
  • in my original app, I also get unrecognized arguments: --multiprocessing-fork 3832 (likely because I’m processing args, which I am not doing in the expackage repo)

I’ve stumbled across a lot of similar reports, and I link those here for completeness:


The expackage repository I set up has 4 branches, with different approaches to execute a function async. Check out the cluster.py in each of these branches:

All of these branches work when executed directly in python, but not when frozen. It’s overkill, but I’ve provided links to the frozen executables (300MB each zip).


I would be very grateful for any hints to solve the problem. I understand the argumentation “don’t use Windows” or “don’t freeze Python environments” - in this case I have no choice, the executable I am working on is for end users on Windows. The reason I need to run the fit_cluster in async mode (on a different thread or process) is that the GUI otherwise freezes and crashes. I’ve excluded all the GUI stuff and made sure that this is not the cause of the issue (it isn’t, as is visible in the example repo). This also shouldn’t matter: generally, I would expect that multiprocessing in frozen apps is still supported in Python 3.7, as it includes an extra hook for it (multiprocessing.freeze_support()) - then this would be an issue of joblib/loky - which is why I am reporting it here.

If anyone has an idea, but can’t test this due to missing Windows, I’m happy to test Pull Requests on the linked Repo. Thanks so much!

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:2
  • Comments:13 (2 by maintainers)

github_iconTop GitHub Comments

5reactions
ogriselcommented, Jul 23, 2020

Note: to change the active backend you can do:


from joblib import parallel_backend

parallel_backend("threading")
# my code here

alternatively with a context manager:

To change the active backend you can do:


from joblib import parallel_backend

with parallel_backend("threading"):
    # my code here

3reactions
mx2048commented, Jul 23, 2020

@ogrisel, below is a minimal reproducing example.

  1. Create and activate a new virtual environment.
  2. Upgrade pip: python -m pip install --upgrade pip
  3. Install required packages: pip install joblib -U pip install cx_freeze==6.1
  4. Create a file sample.py
from joblib import Parallel, delayed

# Expected output: 0123456789
Parallel(n_jobs=-1, backend='loky')(delayed(print)(i, end='') for i in range(10))

  1. Create a file setup.py
from cx_Freeze import setup, Executable


executables = [Executable('sample.py')]

build_exe_options = {
    "build_exe": './build/frozen_exe',
    "excludes": ['tkinter'],
    'include_msvcr': True}

setup(name='joblib_issue_1002',
      version='0.1',
      options={"build_exe": build_exe_options},
      executables=executables
      )
  1. Change directory to the root of the project
  2. Run python setup.py build

Tests Test with backend='threading': OK, prints 0123456789 to console. Test with backend='loky': BAD, outputs nothing, when pressing keyboard interrupt Ctrl+C, shows walls of errors, cannot be stopped, haave to terminate the whole process. Careful: ‘loky’ creates hundreds of processes, uses CPU and RAM up to 100%.

Additional info:

Virtual environment: Python 3.8.2 (tags/v3.8.2:7b3ab59, Feb 25 2020, 22:45:29) [MSC v.1916 32 bit (Intel)] on win32

pip list

Package    Version
---------- -------
cx-Freeze  6.1
joblib     0.16.0
pip        20.1.1
setuptools 40.8.0

You can download a virtual machine for Windows 10 development environment.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Embarrassingly parallel for loops - Joblib - Read the Docs
Using the 'multiprocessing' backend can cause a crash when using third party libraries that manage their own native thread-pool if the library is...
Read more >
Compiling Executable with dask or joblib multiprocessing with ...
Sadly I need to work on windows. When running from within IPython or from command line invoking the py-file with python everything is...
Read more >
joblib Documentation - Read the Docs
Using the 'multiprocessing' backend can cause a crash when using third party libraries that manage their own native thread-pool if the library ...
Read more >
Debugger not working properly with joblib.Parallel when using ...
The Pycharm debugger does not work properly with joblib.Parallel when using the loky backend with n_jobs param >= 1, instead many spurious errors...
Read more >
Analysis Report joblib-0.14.1-py2.py3-none-any.whl
Summary: Lightweight pipelining: using Python functions as pipeline jobs. ... user\AppData\Local\Temp\s2kkx2pl.1sb\joblib\externals\loky\backend\__init__.py.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found