Performance degradation with joblib==0.14.0
See original GitHub issueDescription
There is a significant performance degradation in multi-core mode with joblib 0.14.0 comparing to joblib 0.13.2
Steps/Code to Reproduce
The following code can be used to demonstrate the issue. The code is executed with concurrency 1, 2, 4 and 8 and prints time consumed.
from timeit import timeit
import numpy as np
from sklearn.linear_model import TheilSenRegressor
def test(n_jobs):
x = np.array(range(10))
X = x[:, np.newaxis]
y = [0] * 10
reg = TheilSenRegressor(random_state=0, n_jobs=n_jobs).fit(X, y)
reg.score(X, y)
if __name__ == '__main__':
for n_jobs in [1, 2, 4, 8]:
print(n_jobs, '%.3f' % (timeit(lambda: test(n_jobs=n_jobs), number=10) / 10))
Expected Results
It’s expected that there is no difference between running with joblib 0.13.2 vs 0.14.0
Actual Results
With 0.13.2:
1 0.001
2 0.043
4 0.005
8 0.013
With 0.14.0:
1 0.001
2 0.472
4 0.563
8 0.868
Code profiling shows that most of time is spent in sklearn.externals.joblib._parallel_backends.LokyBackend#wrap_future_result
Versions
System:
python: 3.6.8 (default, Aug 20 2019, 17:12:48) [GCC 8.3.0]
executable: /tmp/.venv/bin/python
machine: Linux-4.15.0-58-generic-x86_64-with-Ubuntu-18.04-bionic
Python deps:
pip: 19.2.3
setuptools: 41.2.0
sklearn: 0.21.3
numpy: 1.17.2
scipy: 1.3.1
Cython: None
pandas: None
$ pip freeze
joblib==0.14.0
numpy==1.17.2
scikit-learn==0.21.3
scipy==1.3.1
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (6 by maintainers)
Top Results From Across the Web
Development — joblib 1.3.0.dev0 documentation
Joblib has an optional dependency on psutil to mitigate memory leaks in parallel worker processes.
Read more >joblib Documentation - Read the Docs
When using more processes than the number of CPU on a machine, the performance of each process is degraded as there is less...
Read more >Tracking progress of joblib.Parallel execution - Stack Overflow
I have a long-running execution composed of thousands of jobs, which I want to track and record in a database. However, to do...
Read more >Release Notes - mlxtend
Due to compatibility issues with newer package versions, certain functions from six.py have been removed so that mlxtend may not work anymore with...
Read more >Joblib changelog - Awesome Python | LibHunt
Release 0.14.0 ... Add a non-regression test related to joblib issues #836 and #833, reporting that cloudpickle ... Results in big performance improvements ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I could reproduce without scikit-learn so this is not scikit-learn related. Closing in favor of joblib/joblib#967.
@ogrisel thanks a lot!