Distributed LocalCluster's `memory_limit` keyword argument needs documentation
See original GitHub issueMinimal Complete Verifiable Example:
from dask.distributed import Client, LocalCluster
cluster = LocalCluster(n_workers=2,
threads_per_worker=4,
memory_target_fraction=0.95,
memory_limit='32GB')
client = Client(cluster)
client
What happened:
Looks like the memory_limit
keyword argument used here sets the limit for the entire cluster (see screenshot below). If that’s the case, it’ll be helpful to add it to the LocalCluster documentation here.
Edit: It sets the limit per worker. My example is a special case because my computer has a maximum of 16GB, hence we can’t go beyond that (see @jcrist
’s comments below for more details). It’ll still be useful to document the behavior.
Anything else we need to know?:
Possible causes of confusion:
- Based on distributed.deploy.local.py, it sounds like it should be set for each worker?
- dask-worker has a CLI option to set memory limit, which was the only other mention of the keyword I could find in the docs
StackOverflow question that surfaced this issue is here.
Screenshot:
Environment:
- Dask version: 2021.09.1
- Python version: 3.9.7
- Operating System: macOS
- Install method (conda, pip, source): conda
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:13 (9 by maintainers)
Top Results From Across the Web
API — Dask.distributed 2022.12.1 documentation
Any extra keywords are passed from Client to LocalCluster in this case. ... Client will create a LocalCluster object, passing any extra keyword...
Read more >Managing worker memory on a dask localcluster
Now in my error messages I keep seeing a reference to a 'memory_limit=' keyword parameter. However I've searched the dask documentation ...
Read more >Understanding Performance — MiniAn documentation
The argument n_workers controls the number of parallel processes (workers) that will be used for computation. Almost all computations in MiniAn can benefit ......
Read more >Advanced Schedulers — ESPEI 0.7.11 documentation
Advanced Schedulers¶. ESPEI uses dask-distributed for parallelization and provides an easy way to deploy clusters locally via TCP with the mcmc.scheduler: ...
Read more >Dask Distributed Release 1.13.0 - Matthew Rocklin
I'm pleased to announce a release of Dask's distributed scheduler, ... to different parts of the cluster with a workers= keyword argument.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I just noticed that too. This is because we take
min
of the user input and the total available system memory.https://github.com/dask/distributed/blob/defe454f63199799b403a3ddeee04b473adf0dfd/distributed/worker.py#L3805
So if your machine has 16 GiB of RAM, each worker is limited to a max of 16 GiB of RAM even if
memory_limit="32 GiB"
. This is what happened in @pavithraes case above, and I think is the correct behavior (but could also be called out in the docstring).Closing as completed. Thanks, @crislanarafael!