Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Generalized ufuncs don't always work for cupy chunked dask arrays

See original GitHub issue

Issue summary

Dask generalized ufuncs do not work correctly when given Dask arrays containing cupy chunks.

This issue was first reported by @lrlunin in https://github.com/dask/dask-image/issues/275 (I’d prefer to transfer that issue here to avoid losing the discussion, but it appears that requires “write” instead of “triage” level access to both repos). I’ve copy-pasted the contents of my comment here.

Minimal reproducible example

You can run this quick test on Google Colab, if you don’t have a GPU on your local machine.

From the Google Colab “Runtime” menu, click “Change Runtime Type” and check the runtime type is set to “GPU”.
Install cupy into the notebook environment. Copy-paste !pip install cupy-cuda11x into a notebook cell and execute it.

I used this docstring example to make a minimal, reproducible test.

It works as expected with a numpy backed dask array:

import dask.array as da
import numpy as np

def outer_product(x, y):
    return np.einsum("i,j->ij", x, y)

a = da.random.normal(size=(20,30), chunks=(10, 30))
b = da.random.normal(size=(10, 1,40), chunks=(5, 1, 40))
c = da.apply_gufunc(outer_product, "(i),(j)->(i,j)", a, b, vectorize=True)
c.compute().shape
# Expected output: (10, 20, 30, 40)
# Works as expected

Fails with a cupy backed dask array:

import dask.array as da
import cupy as cp

def outer_product(x, y):
    return cp.einsum("i,j->ij", x, y)

data_a = cp.random.normal(size=(20,30))
a = da.from_array(data_a, chunks=(10, 30))
data_b = cp.random.normal(size=(10, 1,40))
b = da.from_array(data_b, chunks=(5, 1, 40))
c = da.apply_gufunc(outer_product, "(i),(j)->(i,j)", a, b, vectorize=True)
c.compute().shape
# Expected output: (10, 20, 30, 40)
# TypeError: Implicit conversion to a NumPy array is not allowed. Please use `.get()` to construct a NumPy array explicitly.

Notably, this other docstring example does work with a dask array containing cupy chunks. This example does not use the vectorize keyword argument, which is probably why we don’t see the problem here.

Hypothesis

There are several lines in dask/array/gufunc.py that have numpy specific functions (np.vectorize and np.newaxis, see here, here, and here). These lines might be causing this problem for dask arrays containing cupy chunks.

What we’ve tried so far

To test this idea, I tried switching out the np.vectorize functions for the cupy.vectorize function.

Unfortunately, cupy gives me this error:

NotImplementedError: cupy.vectorize does not support `signature` option currently.

See here in the cupy source code.

What to do next?

We could make a feature request issue in the cupy repository, for supporting the signature keyword argument in cupy.vectorize. I don’t know where that request would fit in to their other priorities.

If that was implemented, Dask could then consider replacing the three numpy-specific lines in dask/array/gufunc.py with some sort of dispatching solution, so that np.vectorize is used by numpy backed dask arrays, and cupy.vectorize is used by cupy backed dask arrays.

Environment:

Dask version: 2022.2.0
Python version: 3.8.15
Operating System: Google Colab is running Linux version #1 SMP Fri Aug 26 08:44:51 UTC 2022
Install method (conda, pip, source):
- dask 2022.2.0 came pre-installed on Google Colab (presumably installed with pip)
- pip install cupy-cuda11x (installs version cupy-cuda11x-11.3.0)
CUDA information: nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0

Issue Analytics

State:
Created 10 months ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

GenevieveBuckleycommented, Dec 8, 2022

However, I got troubles with running out of memory. As far as I understand managing RAM and prevent the out of memory issues is one of dask key features.

You may already be doing this, but are you using the most recent versions of dask & distributed (version 2022.11.0 or above)? There have been some recent changes that dramatically improves memory use during computation (more details in this blogpost written by Gabe Joseph “Reducing memory usage in Dask workloads by 80%”).

You should comment on the discussion thread here, where Gabe is asking for community feedback on workloads that do (or don’t!) work. It might be possible to adjust the config setting values to better suit your workload, but you’ll get better advice if you post on that thread. Try it first with the default settings for dask & distributed 2022.11.0 though, if you haven’t already done that.

I hope this doesn’t feel too much like we keep sending you off to different places, but it is the best way to get each aspect of your problem (gufunc cupy compatibility, memory management, & CUDA parallelism questions) in front of the right audience for advice. I appreciate you doing this.

1reaction

jakirkhamcommented, Dec 5, 2022

cc @pentschev

Top Results From Across the Web

Genevieve Buckley GenevieveBuckley - GitHub

Generalized ufuncs don't always work for cupy chunked dask arrays. Issue summary Dask generalized ufuncs do not work correctly when given Dask arrays ......

Generalized Ufuncs - Dask documentation

Generalized ufuncs are functions that distinguish the various dimensions of passed arrays in the two classes loop dimensions and core dimensions. To accomplish ......

dask.array.core - Dask documentation

This implementation follows NEP-13: https://numpy.org/neps/nep-0013-ufunc-overrides. ... for slicing chunks from an array-like according to a chunk pattern.

Source code for dask.array.routines

Consider removing it in a future version of dask. import cupy xp = cupy chunk = xp.matmul(a, b) # Since we have performed...

Source code for dask.array.reductions

It is always invoked, even when the reduced Array counts a single chunk along ... when no work can be computed on the...