question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Performance degradation with joblib==0.14.0

See original GitHub issue

Description

There is a significant performance degradation in multi-core mode with joblib 0.14.0 comparing to joblib 0.13.2

Steps/Code to Reproduce

The following code can be used to demonstrate the issue. The code is executed with concurrency 1, 2, 4 and 8 and prints time consumed.

from timeit import timeit

import numpy as np
from sklearn.linear_model import TheilSenRegressor


def test(n_jobs):
    x = np.array(range(10))
    X = x[:, np.newaxis]
    y = [0] * 10

    reg = TheilSenRegressor(random_state=0, n_jobs=n_jobs).fit(X, y)
    reg.score(X, y)


if __name__ == '__main__':
    for n_jobs in [1, 2, 4, 8]:
        print(n_jobs, '%.3f' % (timeit(lambda: test(n_jobs=n_jobs), number=10) / 10))

Expected Results

It’s expected that there is no difference between running with joblib 0.13.2 vs 0.14.0

Actual Results

With 0.13.2:

1 0.001
2 0.043
4 0.005
8 0.013

With 0.14.0:

1 0.001
2 0.472
4 0.563
8 0.868

Code profiling shows that most of time is spent in sklearn.externals.joblib._parallel_backends.LokyBackend#wrap_future_result

Versions

System:
    python: 3.6.8 (default, Aug 20 2019, 17:12:48)  [GCC 8.3.0]
executable: /tmp/.venv/bin/python
   machine: Linux-4.15.0-58-generic-x86_64-with-Ubuntu-18.04-bionic

Python deps:
       pip: 19.2.3
setuptools: 41.2.0
   sklearn: 0.21.3
     numpy: 1.17.2
     scipy: 1.3.1
    Cython: None
    pandas: None
$ pip freeze
joblib==0.14.0
numpy==1.17.2
scikit-learn==0.21.3
scipy==1.3.1

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
ogriselcommented, Dec 6, 2019

I could reproduce without scikit-learn so this is not scikit-learn related. Closing in favor of joblib/joblib#967.

0reactions
shakhatcommented, Dec 10, 2019

@ogrisel thanks a lot!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Development — joblib 1.3.0.dev0 documentation
Joblib has an optional dependency on psutil to mitigate memory leaks in parallel worker processes.
Read more >
joblib Documentation - Read the Docs
When using more processes than the number of CPU on a machine, the performance of each process is degraded as there is less...
Read more >
Tracking progress of joblib.Parallel execution - Stack Overflow
I have a long-running execution composed of thousands of jobs, which I want to track and record in a database. However, to do...
Read more >
Release Notes - mlxtend
Due to compatibility issues with newer package versions, certain functions from six.py have been removed so that mlxtend may not work anymore with...
Read more >
Joblib changelog - Awesome Python | LibHunt
Release 0.14.0 ... Add a non-regression test related to joblib issues #836 and #833, reporting that cloudpickle ... Results in big performance improvements ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found