Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dask backend auto-scattering overloads scheduler memory

See original GitHub issue

Cross-posting from https://github.com/dask/dask-ml/issues/789, but I think the better home for this might be here

I seem to be running into problems using the Dask distributed backend for joblib with scikit-learn classes. This notebook has the full reproducible example using a FargateCluster. The issue does not happen with a LocalCluster: https://nbviewer.jupyter.org/gist/rikturr/66427bd13e692726044b4903a790f013

this part fails with ~50MB data size:

with joblib.parallel_backend('dask'):
        search.fit(data, target)

It also fails if I scatter the object before-hand:

client.scatter([data, target])
with joblib.parallel_backend('dask'):
        search.fit(data, target)

It works properly if I manually scatter within parallel_backend like so:

with joblib.parallel_backend('dask', scatter=[data, target]):
        search.fit(data, target)

This leads me to believe something is happening with the auto-scattering causing lots of memory to be passed through the scheduler at once

Issue Analytics

State:
Created 3 years ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

ogriselcommented, Feb 9, 2021

I released joblib 1.0.1 with this fix. Closing.

1reaction

rikturrcommented, Feb 9, 2021

Just tested - master is working great! Tried it out with a 5GB object and the scheduler gets up to 5GB for a few seconds then back down once the workers take over. I was previously on 1.0 so something that happened in master since seems to have fixed it 😃