Out of memory / memory leak debugging
See original GitHub issueHi,
This is less of a clear bug report and more a writeup of some debugging I recently did around weird memory leak like issues while running code using dask. I hope this will save someone a bit of time in the future.
I’m trying to use dask to run some simple code in parallel, as a better multiprocessing
. Roughly this:
def fun(x):
return x + 1
tasks = [delayed(fun)(i) for i in range(1000)]
futs = client.compute(tasks)
In practice fun
is a bit more complex, and reads data from s3, does a bit of computation and writes results back to s3.
When running the real example in parallel I was seeing really slow scheduling, and workers slowly ran out of memory. It seemed to scale roughly with the number of tasks, each task takes around 30 seconds on a single core, and when I was running 10 of them it all worked perfectly. 500 still worked, 1000 were borderline and 10000 certainly didn’t.
My workers died with various exceptions related to memory usage, e.g.
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
After a bit of debugging this seems to be caused by a helper class that got passed into fun
, a minimal example that breaks looks like this:
class S3FsWrapper(object):
def __init__(self):
self.fs = s3fs.S3FileSystem()
def get_s3fs(self):
return self.fs
fs = S3FsWrapper()
def fun(fs, x):
# would do something with fs here, but not necessary to trigger OOM
return x + 1
tasks = [delayed(fun)(fs, i) for i in range(1000)]
futs = client.compute(tasks)
progress(futs)
Running this will take ages, and depending on how much RAM you have will most likely crash.
So I had a look at where all this memory goes with pympler.muppy
:
def debug_mem():
from pympler import summary, muppy
all_objects = muppy.get_objects()
s = summary.summarize(all_objects)
return s
s = client.run(debug_mem)
from pympler import summary, muppy
summary.print_(list(s.values())[0])
types | # objects | total size
======================================================= | =========== | ============
<class 'collections.OrderedDict | 372201 | 163.90 MB
<class 'str | 598212 | 48.18 MB
<class 'dict | 90573 | 23.05 MB
<class 'list | 95265 | 7.50 MB
<class '_io.BufferedWriter | 3 | 4.25 MB
<class 'code | 25294 | 3.49 MB
<class 'type | 3317 | 3.39 MB
<class 'botocore.hooks.NodeList | 19000 | 1.45 MB
<class 'tuple | 22560 | 1.42 MB
<class 'set | 2921 | 1.28 MB
<class 'cell | 22473 | 1.03 MB
<class 'botocore.docs.docstring.ClientMethodDocstring | 7700 | 789.55 KB
<class 'weakref | 5410 | 422.66 KB
<class 'botocore.model.OperationModel | 7700 | 421.09 KB
<class 'int | 8753 | 261.06 KB
It looks like every task instance has loaded its own copy of botocore
. All the strings contain AWS api descriptions, and I suspect the OrderedDicts
are similar.
So this is how far I’ve gotten. Runnable notebook is at https://github.com/ah-/notebooks/blob/master/dask_oom.ipynb.
I have some ideas what exactly is going on underneath, but I’d be grateful for a clear explanation, and maybe some hints how to avoid this. I suspect this isn’t actually a dask bug but a side-effect of how data is serialised and passed around.
Issue Analytics
- State:
- Created 5 years ago
- Comments:14 (11 by maintainers)
Top GitHub Comments
I recommend raising an issue upstream on s3fs noting that creating many S3FileSystems (or at least deserializing them) seems to make many botocore objects, and asking if there is a place where you can help to correct the issue.
OK, I can reproduce the issue. Some details:
This helps when testing on larger systems
When I watch the diagnostic dashboard I notice that memory jumps up quickly before any of the computations start. I suspect that this means that the memory cost isn’t in the results of the worker, its in the deserialized versions of the tasks themselves (the many Python functions). Generally we don’t have any controls on data like this that we expect to be small. I am not surprised to learn that Dask crashes here.
One thing that may help here would be if we were to do a bit of caching on deserialization. “Hey, I’ve seen this huge string of bytes recently, it turned into this function, I’ll just return that immediately rather than deserialize it again.” But this will likely have complications of its own.
I’m tempted to say “just don’t send hundreds of thousands of tasks that close over non-trivial data”
This isn’t that big serialized, but the serialization time is non-trivial and I wouldn’t be surprised if it’s much bigger when in memory.