memory leak when using distributed.Client with delayed
See original GitHub issueI have used dask.delayed
to wire together some classes and when using dask.threaded.get
everything works properly. When same code is run using distributed.Client
memory used by process keeps growing.
Dummy code to reproduce issue is below.
import gc
import os
import psutil
from dask import delayed
# generate random strings: https://stackoverflow.com/a/16310739
class Data():
def __init__(self):
self.tbl = bytes.maketrans(bytearray(range(256)),
bytearray([ord(b'a') + b % 26 for b in range(256)]))
@staticmethod
def split_len(seq, length):
return [seq[i:i + length] for i in range(0, len(seq), length)]
def get_data(self):
l = self.split_len(os.urandom(1000000).translate(self.tbl), 1000)
return l
class Calc():
def __init__(self, l):
self.l = l
def nth_nth_item(self, n):
return self.l[n][n]
class Combiner():
def __init__(self):
self.delayed_data = delayed(Data())
def get_calc(self):
d_l = self.delayed_data.get_data(pure=True)
return delayed(Calc, pure=True)(d_l)
def mem_usage_mb(self):
process = psutil.Process(os.getpid())
return "%.2f" % (process.memory_info().rss * 1e-6)
def results(self):
return {
'0': self.get_calc().nth_nth_item(0),
'1': self.get_calc().nth_nth_item(1),
'2': self.get_calc().nth_nth_item(2),
'mem_usage_mb': self.mem_usage_mb()
}
def delayed_results(self):
return delayed(self.results())
def main_threaded_get():
from dask.threaded import get as threaded_get
from dask import compute
for i in range(300):
delayed_obj = Combiner().delayed_results()
res = compute(delayed_obj, key=threaded_get)[0]
#print(res)
print("#%d, mem: %s mb" % (i, res['mem_usage_mb']))
gc.collect()
def main_distributed_client():
from distributed import Client
client = Client(processes=True, n_workers=1, threads_per_worker=1)
for i in range(1000):
delayed_obj = Combiner().delayed_results()
future = client.compute(delayed_obj)
res = future.result()
print("#%d, mem: %s mb" % (i, res['mem_usage_mb']))
collect_res = client.run(lambda: gc.collect()) # doesn't help
# print(collect_res)
if __name__ == "__main__":
main_threaded_get()
main_distributed_client()
Results:
main_threaded_get():
100, mem: 33.64 mb
200, mem: 33.64 mb
299, mem: 33.64 mb
main_distributed_client()
100, mem: 94.02 mb
200, mem: 96.02 mb
300, mem: 97.95 mb
400, mem: 100.11 mb
500, mem: 102.29 mb
600, mem: 104.48 mb
700, mem: 106.72 mb
800, mem: 108.20 mb
900, mem: 110.02 mb
999, mem: 112.22 mb
And also "distributed.utils_perf - WARNING - full garbage collections took 60% CPU time recently (threshold: 10%)" messages starting with i=30
Python 3.6.5
>>> dask.__version__
'0.18.0'
>>> distributed.__version__
'1.22.0'
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:26 (7 by maintainers)
Top Results From Across the Web
Tackling unmanaged memory with Dask - Coiled
Unmanaged memory is RAM that the Dask scheduler is not directly aware of and which can cause workers to run out of memory...
Read more >Managing Memory — Dask.distributed 2022.12.1 documentation
The compute and persist methods handle Dask collections like arrays, bags, delayed values, and dataframes. The scatter method sends data directly from the...
Read more >Why Dask is not respecting the memory limits for LocalCluster?
worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to...
Read more >Dask Memory Leak Workaround
When using the Dask dataframe where clause I get a “distributed.worker_memory - WARNING - Unmanaged memory use is high.
Read more >Troubleshooting and Optimizing Dask Resources | Saturn Cloud
As you use a Dask cluster, some objects may not be cleared from memory when your code stops running. These memory leaks can...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@Axel-CH I’ve also noticed a mismatch between the memory usage reported by dask distributed and the OS. What helped me to resolve problems of freezed and killed workers was to change the configuration described here to the following:
Code for: mleak.py
Modified script compares memory usage using tracemalloc before computing delayed function and after.
If I’m interpreting tracemalloc results correctly, then it looks that memory usage grows when pickle.loads is called.
Run:
python -X tracemalloc mleak.py
Top memory increases per invocation:
Call stack for distributed/protocol/pickle.py (different invocation)