GPU-friendly loads / merge_frames
See original GitHub issueJust throwing this up here for now, need to investigate more.
I’m working on a distributed cudf join using UCX. Things progress fine until, in the Client process, we attempt to deserialize some data (I think the final result?). We end up calling calling loads
with deserialize=True
: https://github.com/dask/distributed/blob/fb30c33562862f30864456766424b44a3e91aa5b/distributed/protocol/core.py#L101
which calls merge_frames
: https://github.com/dask/distributed/blob/fb30c33562862f30864456766424b44a3e91aa5b/distributed/protocol/utils.py#L80
which attempt to convert the data to a byte string.
At this point in the client process, frames
is a list of objects representing device memory. If possible (and I think it’s possible), I’d like to avoid copying to the host here.
Actually, this may only be possible if the Client happens to have a GPU as well. In this case that’s true, but not in general.
TODO:
- figure out exactly where the client is calling this
- …
Issue Analytics
- State:
- Created 5 years ago
- Comments:14 (14 by maintainers)
Top GitHub Comments
I think we’ll close this and reopen if we come across it with the current implementations.
To the spirit of the issue, we do have
cuda_dumps
andcuda_loads
. Thus farmerge_frames
doesn’t behave how we would expect ( https://github.com/dask/distributed/issues/3580 ) so we mostly avoid it. Though PR ( https://github.com/dask/distributed/pull/3732 ) has merge and split frame style functions. So maybe that solves that piece of this issue?