Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

fit something with parallel_backend is unreasonably slow

See original GitHub issue

I set up a Dask cluster and tries to reproduce the distributed Machine Learning tutorial from https://github.com/dask/dask-tutorial/blob/master/08_machine_learning.ipynb when I run the command

%%time
with joblib.parallel_backend("dask", scatter=[X, y]):
    grid_search.fit(X, y)

it’s not complete even after an hour and compare to the fitting without Dask as a backend I thought it’s unreasonably slow. Since it does not give me an error message I wonder if anybody here could give me some advice. Thanks

Issue Analytics

State:
Created 5 years ago
Comments:10 (5 by maintainers)

Top GitHub Comments

1reaction

jakirkhamcommented, May 3, 2019

Shouldn’t be implemented a warning for such case?

Please see PR ( https://github.com/dask/distributed/pull/2627 ).

0reactions

TomAugspurgercommented, May 10, 2019

As noted in https://joblib.readthedocs.io/en/latest/auto_examples/parallel/distributed_backend_simple.html#sphx-glr-auto-examples-parallel-distributed-backend-simple-py,

This example shows the simplest usage of the dask distributed backend, on the local computer.

This is useful for prototyping a solution, to later be run on a truly distributed cluster, as the only change to be made is the address of the scheduler.

So you’ve created a dask.distributed “cluster” locally on your machine. But to make it useful you need to connect to a real cluster (setup described in https://docs.dask.org/en/latest/setup.html)

Top Results From Across the Web

Embarrassingly parallel for loops - Joblib - Read the Docs

This way, you can have fast pickling of all python objects and locally enable slow pickling for interactive functions. An example is given...

bsseq: Analyze, manage and store bisulfite sequencing data

An object of class BSseq, containing coefficients used to fit smoothed ... The (slower) alternative is to use Reduce(combine, list).

What needs to be done to make n_jobs work properly on ...

Note: This answer is from experience from daily basis working to take advantage of sparkjoblib and parallel_backend('spark') and parallel_backend('dask') .

Tore Selland Kleppe Contents

ABSTRACT. For certain classes of hierarchical models, it is easy to derive an expression for the joint moment-generating function (MGF) of data, ...

Easy distributed training with Joblib and dask - Tom Augspurger

For large arrays or dataframes this can be slow, and it may blow up ... Client('dask-scheduler:8786') with joblib.parallel_backend("dask", ...