fit something with parallel_backend is unreasonably slow
See original GitHub issueI set up a Dask cluster and tries to reproduce the distributed Machine Learning tutorial from https://github.com/dask/dask-tutorial/blob/master/08_machine_learning.ipynb
when I run the command
%%time
with joblib.parallel_backend("dask", scatter=[X, y]):
grid_search.fit(X, y)
it’s not complete even after an hour and compare to the fitting without Dask as a backend I thought it’s unreasonably slow. Since it does not give me an error message I wonder if anybody here could give me some advice. Thanks
Issue Analytics
- State:
- Created 5 years ago
- Comments:10 (5 by maintainers)
Top Results From Across the Web
Embarrassingly parallel for loops - Joblib - Read the Docs
This way, you can have fast pickling of all python objects and locally enable slow pickling for interactive functions. An example is given...
Read more >bsseq: Analyze, manage and store bisulfite sequencing data
An object of class BSseq, containing coefficients used to fit smoothed ... The (slower) alternative is to use Reduce(combine, list).
Read more >What needs to be done to make n_jobs work properly on ...
Note: This answer is from experience from daily basis working to take advantage of sparkjoblib and parallel_backend('spark') and parallel_backend('dask') .
Read more >Tore Selland Kleppe Contents
ABSTRACT. For certain classes of hierarchical models, it is easy to derive an expression for the joint moment-generating function (MGF) of data, ...
Read more >Easy distributed training with Joblib and dask - Tom Augspurger
For large arrays or dataframes this can be slow, and it may blow up ... Client('dask-scheduler:8786') with joblib.parallel_backend("dask", ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Please see PR ( https://github.com/dask/distributed/pull/2627 ).
As noted in https://joblib.readthedocs.io/en/latest/auto_examples/parallel/distributed_backend_simple.html#sphx-glr-auto-examples-parallel-distributed-backend-simple-py,
So you’ve created a dask.distributed “cluster” locally on your machine. But to make it useful you need to connect to a real cluster (setup described in https://docs.dask.org/en/latest/setup.html)