Experiment with map_overlap and cupy arrays
See original GitHub issueIt would be useful to try the da.map_overlap function with CuPy arrays on large 2d datasets.
A trivial example might look something like the following (untested):
import dask.array as da
import cupy
rs = da.random.RandomState(RandomState=cupy.random.RandomState) # swap cupy->numpy here for comparison
x = rs.random(500000, 500000), chunks=(10000, 10000))
x.map_overlap(lambda x: x, depth=1)
x.sum().compute() # trigger computation, but don't ask for the entire array as a result
My guess is that we’ll be badly bound by communication. I would verify this probably by running this computation under the dask distributed scheduler but started with dask-cuda’s LocalCUDACluster, and then by watching the dashboard.
My hope is that once the UCX work finishes that this cost goes down considerably. It will be interesting to see by how much.
Additionally, we might try using numba.cuda.jit to build some simple nearest-neighbor kernel function and applying that with map_overlap
over the array. This notebook from this blogpost might be an interesting starting point here (but there are probably more interesting operations).
cc @madsbk
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:33 (33 by maintainers)
The easiest way I’ve found to do this is to push the HTML files to a gh-pages branch and then go to username.github.io/repo-name/path-to-file.html
http://mrocklin.github.io/raw-host/map-overlap/map_overlap_10k_tcp.html
http://mrocklin.github.io/raw-host/map-overlap/map_overlap_10k_ucx.html
https://github.com/mrocklin/raw-host/commit/72f0876b88c2f7d4e4bc0b5e845811a28fc220cc
Have gone ahead and put together a simple benchmark script in PR ( https://github.com/rapidsai/dask-cuda/pull/399 ). This should give us a way to track performance and measure improvements. Perhaps we can close this once that is in? New issues could follow up on more specific improvements as needed.