Worker memory not being freed when tasks complete
See original GitHub issueI’m still investigating, but in the meantime I wanted to get this issue started.
I’m noticing that after executing a task graph with large inputs and a small output, my worker memory stays high. In the example below we
- Generate data (large byte strings)
- filter data (slice)
- reduce many tasks (sum)
So the final result returned to the client is small, a single Python int. The only large objects should be the initially generated bytestrings.
The console output below is
- per-worker memory usage before the computation (~30 MB)
- per-worker memory usage right after the computation (~ 230 MB)
- per-worker memory usage 5 seconds after, in case things take some time to settle down. (~ 230 MB)
Memory usage [before]
{'tcp://192.168.7.20:50533': '30.92 MB', 'tcp://192.168.7.20:50534': '30.95 MB'}
running
Memory usage [after]
{'tcp://192.168.7.20:50533': '231.97 MB',
'tcp://192.168.7.20:50534': '232.63 MB'}
Memory usage [after]
{'tcp://192.168.7.20:50533': '232.05 MB',
'tcp://192.168.7.20:50534': '232.63 MB'}
In an effort to test whether the scheduler or worker is holding a reference to the data, I submit a bunch of tiny inc
tasks to one of the worker. I notice that the memory on that worker does settle down
Memory usage [final]
{'tcp://192.168.7.20:52114': '232.77 MB',
'tcp://192.168.7.20:52115': '49.73 MB'}
That’s at least consistent with the worker or scheduler holding a reference to the data, but there could be many other causes. I’m still debugging.
The number of inc
tasks, 2731, seems to be significant. With 2730 inc
tasks, I don’t see any memory reduction on that worker.
import time
from dask.utils import parse_bytes, format_bytes
import pprint
import string
import toolz
from distributed import Client, wait
N = parse_bytes("100 Mb")
I = 20
def inc(x):
return x + 1
def f(x, n=N):
time.sleep(0.05)
return string.ascii_letters[x % 52].encode() * n
def g(x):
time.sleep(0.02)
return x[:5]
def h(*args):
return sum(x[0] for x in args)
def get_mem(dask_worker):
return dask_worker.monitor.proc.memory_info().rss
def main():
dsk = {}
for i in range(I):
dsk[f'a-{i}'] = (f, i, N)
dsk[f'b-{i}'] = (g, f'a-{i}')
dsk['c-0'] = (h,) + tuple(f'b-{i}' for i in range(I))
with Client(n_workers=2, threads_per_worker=1, memory_limit='500Mb', processes=True) as client:
print("Memory usage [before]")
pprint.pprint(toolz.valmap(format_bytes, client.run(get_mem)))
print("running")
client.get(dsk, keys=["c-0"])
time.sleep(2) # let things settle
print("Memory usage [after]")
pprint.pprint(toolz.valmap(format_bytes, client.run(get_mem)))
time.sleep(5) # settle some more?
print("Memory usage [after]")
pprint.pprint(toolz.valmap(format_bytes, client.run(get_mem)))
print("clear things?")
futures = client.map(inc, range(2731), pure=False)
wait(futures)
del futures
print("Memory usage [final]")
pprint.pprint(toolz.valmap(format_bytes, client.run(get_mem)))
if __name__ == '__main__':
main()
Issue Analytics
- State:
- Created 4 years ago
- Reactions:10
- Comments:25 (14 by maintainers)
Top GitHub Comments
This problem is indeed a big one, preventing me to use dask in production where I have a very long running task and 200 gigs of memory get used in not time Is there anything at all you can suggest to investigate the issue or try to mitigate it?
I already tried the suggested PR without success. My structure is made by nested multiprocessing on different layers like:
A1
runs in parallel(B1, B2)
B1
runs in parallel(C1, C2)
B2
runs in parallel(C3, C4)
C1
runs in parallel(D1, D2)
C2
runs in parallel(D3, D4)
C3
runs in parallel(D6, D5)
C4
runs in parallel(D7, D8)
And everything works fine except for the fact that once the tasks in the most inner layer
D
are completed the memory never gets released and it accumulates until the kernel dies.In this situation of nested processes I cannot even restart the client in the inner layers because this will end up affecting the whole computation. So for me there is really no solution here.
Any help would be much appreciated.
I’m also experiencing some king of memory leak, though it might not be related. I’m using only Dask distributed as a job scheduler, not even passing any substantial data. The input is just a filename and there is no return value. And the job itself is calling only bare pandas and numpy. This way I’m processing 4000 files (almost equaly sized) on 40 core machine in cca 45 minutes.
With Dask distributed the memory usage continuously increases until the work is done. At that point it’s consuming 40 GB and the memory is not freed. I see
distributed.worker - WARNING - Memory use is high but worker has no data to store to disk. Perhaps some other process is leaking memory?
in logs, which seems alright because Dask is not involved in any data manipulation.The strange thing is that I don’t experience memory leak when using
multiprocessing.Pool
to dispatch those jobs. (And I also get better CPU usage.) Though it also consumes cca 10 GB which is not freed entirely, but the usage is almost constant all the time.So it seams that Dask is not directly involved, yet it makes the difference somehow.
I’m running Debian buster, python3.7 and latest libraries (dask==2020.12.0, numpy==1.19.5, pandas==1.2.0). (Python3.8 seems to make no difference.)
There are
htop
leak-screenshots.zip.See the code...