Generalized ufuncs don't always work for cupy chunked dask arrays
See original GitHub issueIssue summary
Dask generalized ufuncs do not work correctly when given Dask arrays containing cupy chunks.
This issue was first reported by @lrlunin in https://github.com/dask/dask-image/issues/275 (I’d prefer to transfer that issue here to avoid losing the discussion, but it appears that requires “write” instead of “triage” level access to both repos). I’ve copy-pasted the contents of my comment here.
Minimal reproducible example
You can run this quick test on Google Colab, if you don’t have a GPU on your local machine.
- From the Google Colab “Runtime” menu, click “Change Runtime Type” and check the runtime type is set to “GPU”.
- Install cupy into the notebook environment. Copy-paste
!pip install cupy-cuda11x
into a notebook cell and execute it.
I used this docstring example to make a minimal, reproducible test.
It works as expected with a numpy backed dask array:
import dask.array as da
import numpy as np
def outer_product(x, y):
return np.einsum("i,j->ij", x, y)
a = da.random.normal(size=(20,30), chunks=(10, 30))
b = da.random.normal(size=(10, 1,40), chunks=(5, 1, 40))
c = da.apply_gufunc(outer_product, "(i),(j)->(i,j)", a, b, vectorize=True)
c.compute().shape
# Expected output: (10, 20, 30, 40)
# Works as expected
Fails with a cupy backed dask array:
import dask.array as da
import cupy as cp
def outer_product(x, y):
return cp.einsum("i,j->ij", x, y)
data_a = cp.random.normal(size=(20,30))
a = da.from_array(data_a, chunks=(10, 30))
data_b = cp.random.normal(size=(10, 1,40))
b = da.from_array(data_b, chunks=(5, 1, 40))
c = da.apply_gufunc(outer_product, "(i),(j)->(i,j)", a, b, vectorize=True)
c.compute().shape
# Expected output: (10, 20, 30, 40)
# TypeError: Implicit conversion to a NumPy array is not allowed. Please use `.get()` to construct a NumPy array explicitly.
Notably, this other docstring example does work with a dask array containing cupy chunks. This example does not use the vectorize
keyword argument, which is probably why we don’t see the problem here.
Hypothesis
There are several lines in dask/array/gufunc.py that have numpy specific functions
(np.vectorize
and np.newaxis
, see here, here, and here). These lines might be causing this problem for dask arrays containing cupy chunks.
What we’ve tried so far
To test this idea, I tried switching out the np.vectorize
functions for the cupy.vectorize
function.
Unfortunately, cupy gives me this error:
NotImplementedError: cupy.vectorize does not support `signature` option currently.
See here in the cupy source code.
What to do next?
We could make a feature request issue in the cupy repository, for supporting the signature
keyword argument in cupy.vectorize
. I don’t know where that request would fit in to their other priorities.
If that was implemented, Dask could then consider replacing the three numpy-specific lines in dask/array/gufunc.py with some sort of dispatching solution, so that np.vectorize
is used by numpy backed dask arrays, and cupy.vectorize
is used by cupy backed dask arrays.
Environment:
- Dask version: 2022.2.0
- Python version: 3.8.15
- Operating System: Google Colab is running Linux version #1 SMP Fri Aug 26 08:44:51 UTC 2022
- Install method (conda, pip, source):
- dask 2022.2.0 came pre-installed on Google Colab (presumably installed with pip)
pip install cupy-cuda11x
(installs version cupy-cuda11x-11.3.0)
- CUDA information:
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
Issue Analytics
- State:
- Created 10 months ago
- Comments:7 (4 by maintainers)
Top GitHub Comments
You may already be doing this, but are you using the most recent versions of dask & distributed (version 2022.11.0 or above)? There have been some recent changes that dramatically improves memory use during computation (more details in this blogpost written by Gabe Joseph “Reducing memory usage in Dask workloads by 80%”).
You should comment on the discussion thread here, where Gabe is asking for community feedback on workloads that do (or don’t!) work. It might be possible to adjust the config setting values to better suit your workload, but you’ll get better advice if you post on that thread. Try it first with the default settings for dask & distributed 2022.11.0 though, if you haven’t already done that.
I hope this doesn’t feel too much like we keep sending you off to different places, but it is the best way to get each aspect of your problem (gufunc cupy compatibility, memory management, & CUDA parallelism questions) in front of the right audience for advice. I appreciate you doing this.
cc @pentschev