Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Distributed LocalCluster's `memory_limit` keyword argument needs documentation

See original GitHub issue

Minimal Complete Verifiable Example:

from dask.distributed import Client, LocalCluster
cluster = LocalCluster(n_workers=2,
                       threads_per_worker=4,
                       memory_target_fraction=0.95,
                       memory_limit='32GB')
client = Client(cluster)
client

What happened: Looks like the memory_limit keyword argument used here sets the limit ~~for the entire cluster (see screenshot below)~~. If that’s the case, it’ll be helpful to add it to the LocalCluster documentation here.

Edit: It sets the limit per worker. My example is a special case because my computer has a maximum of 16GB, hence we can’t go beyond that (see @jcrist’s comments below for more details). It’ll still be useful to document the behavior.

Anything else we need to know?:

Possible causes of confusion:

Based on distributed.deploy.local.py, it sounds like it should be set for each worker?
dask-worker has a CLI option to set memory limit, which was the only other mention of the keyword I could find in the docs

StackOverflow question that surfaced this issue is here.

Screenshot:

Screenshot 2021-10-06 at 2 02 45 PM

Environment:

Dask version: 2021.09.1
Python version: 3.9.7
Operating System: macOS
Install method (conda, pip, source): conda

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:13 (9 by maintainers)

Top GitHub Comments

2reactions

jcristcommented, Oct 6, 2021

I just noticed that too. This is because we take min of the user input and the total available system memory.

https://github.com/dask/distributed/blob/defe454f63199799b403a3ddeee04b473adf0dfd/distributed/worker.py#L3805

So if your machine has 16 GiB of RAM, each worker is limited to a max of 16 GiB of RAM even if memory_limit="32 GiB". This is what happened in @pavithraes case above, and I think is the correct behavior (but could also be called out in the docstring).

0reactions

pavithraescommented, Aug 29, 2022

Closing as completed. Thanks, @crislanarafael!

Top Results From Across the Web

API — Dask.distributed 2022.12.1 documentation

Any extra keywords are passed from Client to LocalCluster in this case. ... Client will create a LocalCluster object, passing any extra keyword...

Managing worker memory on a dask localcluster

Now in my error messages I keep seeing a reference to a 'memory_limit=' keyword parameter. However I've searched the dask documentation ...

Understanding Performance — MiniAn documentation

The argument n_workers controls the number of parallel processes (workers) that will be used for computation. Almost all computations in MiniAn can benefit ......

Advanced Schedulers — ESPEI 0.7.11 documentation

Advanced Schedulers¶. ESPEI uses dask-distributed for parallelization and provides an easy way to deploy clusters locally via TCP with the mcmc.scheduler: ...

Dask Distributed Release 1.13.0 - Matthew Rocklin

I'm pleased to announce a release of Dask's distributed scheduler, ... to different parts of the cluster with a workers= keyword argument.