question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dask backend auto-scattering overloads scheduler memory

See original GitHub issue

Cross-posting from https://github.com/dask/dask-ml/issues/789, but I think the better home for this might be here

I seem to be running into problems using the Dask distributed backend for joblib with scikit-learn classes. This notebook has the full reproducible example using a FargateCluster. The issue does not happen with a LocalCluster: https://nbviewer.jupyter.org/gist/rikturr/66427bd13e692726044b4903a790f013

this part fails with ~50MB data size:

with joblib.parallel_backend('dask'):
        search.fit(data, target)

It also fails if I scatter the object before-hand:

client.scatter([data, target])
with joblib.parallel_backend('dask'):
        search.fit(data, target)

It works properly if I manually scatter within parallel_backend like so:

with joblib.parallel_backend('dask', scatter=[data, target]):
        search.fit(data, target)

This leads me to believe something is happening with the auto-scattering causing lots of memory to be passed through the scheduler at once

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
ogriselcommented, Feb 9, 2021

I released joblib 1.0.1 with this fix. Closing.

1reaction
rikturrcommented, Feb 9, 2021

Just tested - master is working great! Tried it out with a 5GB object and the scheduler gets up to 5GB for a few seconds then back down once the workers take over. I was previously on 1.0 so something that happened in master since seems to have fixed it 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Managing Memory — Dask.distributed 2022.12.1 documentation
The central scheduler tracks all data on the cluster and determines when data should be freed. Completed results are usually cleared from memory...
Read more >
Chunk data and parallelize computation - icclim - Read the Docs
This means we must either rechunk in memory to have an optimized chunking. However, this generates many dask tasks and can overload dask...
Read more >
Optimize running large number of tasks using Dask - Qxf2 BLOG
Added more workers to coiled (16 from 8), increased CPUs and memory (4 CPUs and 30GB from 1 CPU and 8GB), expanded Scheduler...
Read more >
Data Won't Fit in Memory? Parallel Computing with Dask to ...
If a workers crashes or becomes overloaded mid-computation, the scheduler can compensate and re-allocate tasks to functional workers.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found