[Question] A new approach to memory spilling
See original GitHub issueQuestion: would the Dask/Distributed community be interested in an improved memory spilling model that fixes the shortcomings of the current one but make use of proxy object wrappers?
In Dask-CUDA we have introduced a new approach to memory spilling that handles object aliasing and JIT memory un-spilling: https://github.com/rapidsai/dask-cuda/pull/451
The result is memory spilling that:
- Avoids double counting: https://github.com/dask/distributed/issues/4186
- Avoids spilling of the same object multiple times.
- Avoids memory spikes because of incorrect memory tally.
- Implement just-in-time un-spilling: https://github.com/dask/distributed/pull/3998
- Support communication of spilled data so that GPU data don’t have to be un-spilled just to be spilled again as part of communication: https://github.com/rapidsai/dask-cuda/issues/342
- First step to enable partial spilled objects such as the spilling of individual columns in a data frame: https://github.com/BlazingDB/blazingsql/issues/1128
The current implement in Dask-CUDA handles CUDA device objects but it is possible to generalize to also handle spilling to disk.
The disadvantage of this approach is the use of proxy objects that get exposed to the users. The inputs to a tasks might be wrapped in a proxy object, which doesn’t mimic the proxied object perfectly. E.g.:
# Type checking using instance() works as expected but direct type checking doesn't:
>>> import numpy as np
>>> from dask_cuda.proxy_object import asproxy
>>> x = np.arange(3)
>>> isinstance(asproxy(x), type(x))
True
>>> type(asproxy(x)) is type(x)
False
Because of this, the approach shouldn’t be enabled by default but do you think that the Dask community would be interested in a generalization of this approach? Or is the proxy object hurdle too much of an issue?
cc. @mrocklin, @jrbourbeau, @quasiben
Issue Analytics
- State:
- Created 3 years ago
- Comments:20 (19 by maintainers)
Top GitHub Comments
This may also provide other benefits, like allowing for async reading/writing, and improved worker scheduling of tasks that is sensitive to data that is already in fast memory, or pre-fetching data from slow memory.
If this is very effective for shuffle workloads then maybe it’s something that we could implement just for that code path? That might be a tightly scoped place to try this out more broadly.