question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

fit something with parallel_backend is unreasonably slow

See original GitHub issue

I set up a Dask cluster and tries to reproduce the distributed Machine Learning tutorial from https://github.com/dask/dask-tutorial/blob/master/08_machine_learning.ipynb when I run the command

%%time
with joblib.parallel_backend("dask", scatter=[X, y]):
    grid_search.fit(X, y)

it’s not complete even after an hour and compare to the fitting without Dask as a backend I thought it’s unreasonably slow. Since it does not give me an error message I wonder if anybody here could give me some advice. Thanks

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
jakirkhamcommented, May 3, 2019

Shouldn’t be implemented a warning for such case?

Please see PR ( https://github.com/dask/distributed/pull/2627 ).

0reactions
TomAugspurgercommented, May 10, 2019

As noted in https://joblib.readthedocs.io/en/latest/auto_examples/parallel/distributed_backend_simple.html#sphx-glr-auto-examples-parallel-distributed-backend-simple-py,

This example shows the simplest usage of the dask distributed backend, on the local computer.

This is useful for prototyping a solution, to later be run on a truly distributed cluster, as the only change to be made is the address of the scheduler.

So you’ve created a dask.distributed “cluster” locally on your machine. But to make it useful you need to connect to a real cluster (setup described in https://docs.dask.org/en/latest/setup.html)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Embarrassingly parallel for loops - Joblib - Read the Docs
This way, you can have fast pickling of all python objects and locally enable slow pickling for interactive functions. An example is given...
Read more >
bsseq: Analyze, manage and store bisulfite sequencing data
An object of class BSseq, containing coefficients used to fit smoothed ... The (slower) alternative is to use Reduce(combine, list).
Read more >
What needs to be done to make n_jobs work properly on ...
Note: This answer is from experience from daily basis working to take advantage of sparkjoblib and parallel_backend('spark') and parallel_backend('dask') .
Read more >
Tore Selland Kleppe Contents
ABSTRACT. For certain classes of hierarchical models, it is easy to derive an expression for the joint moment-generating function (MGF) of data, ...
Read more >
Easy distributed training with Joblib and dask - Tom Augspurger
For large arrays or dataframes this can be slow, and it may blow up ... Client('dask-scheduler:8786') with joblib.parallel_backend("dask", ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found