Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

cuTENSOR 1.6: failure in math operations on some arrays with singleton dimensions

See original GitHub issue

Description

I observed one test failure in cuCIM today when running the test suite using CuPy 11, but only when CUPY_ACCELERATORS contains “cutensor” . I have not yet gone back and tried with older CuPy. I also have not tested older versions of cuTENSOR

To Reproduce

A minimal reproducer is

import cupy as cp
a = cp.ones((4, 1, 5), dtype=float)
b = a.copy()
a - b

which gives the error:

CuTensorError                             Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 a - b

File ~/src/public/cupy/cupy/_core/core.pyx:1271, in cupy._core.core._ndarray_base.__sub__()

File ~/src/public/cupy/cupy/_core/_kernel.pyx:1259, in cupy._core._kernel.ufunc.__call__()

File ~/src/public/cupy/cupy/cutensor.pyx:905, in cupy.cutensor._try_elementwise_binary_routine()

File ~/src/public/cupy/cupy/cutensor.pyx:402, in cupy.cutensor._elementwise_binary_impl()

File ~/src/public/cupy/cupy_backends/cuda/libs/cutensor.pyx:507, in cupy_backends.cuda.libs.cutensor.elementwiseBinary()

File ~/src/public/cupy/cupy_backends/cuda/libs/cutensor.pyx:260, in cupy_backends.cuda.libs.cutensor.check_status()

Installation

Source (pip install cupy)

Environment

OS                           : Linux-5.14.0-1045-oem-x86_64-with-glibc2.31
Python Version               : 3.9.13
CuPy Version                 : 11.0.0
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 1.21.6
SciPy Version                : 1.8.1
Cython Build Version         : 0.29.30
Cython Runtime Version       : 0.29.30
CUDA Root                    : /usr/local/cuda
nvcc PATH                    : /usr/local/cuda/bin/nvcc
CUDA Build Version           : 11070
CUDA Driver Version          : 11070
CUDA Runtime Version         : 11070
cuBLAS Version               : (available)
cuFFT Version                : 10702
cuRAND Version               : 10210
cuSOLVER Version             : (11, 3, 5)
cuSPARSE Version             : (available)
NVRTC Version                : (11, 7)
Thrust Version               : 101500
CUB Build Version            : 101500
Jitify Build Version         : 4a37de0
cuDNN Build Version          : 8401
cuDNN Version                : 8401
NCCL Build Version           : 21304
NCCL Runtime Version         : 21304
cuTENSOR Version             : 10600
cuSPARSELt Build Version     : None
Device 0 Name                : NVIDIA RTX A6000
Device 0 Compute Capability  : 86
Device 0 PCI Bus ID          : 0000:17:00.0
Device 1 Name                : Quadro P620
Device 1 Compute Capability  : 61
Device 1 PCI Bus ID          : 0000:65:00.0

Additional Information

This operation only seems to fail for the example above when the “singleton” dimension is present and not in either the first or last position of the shape. For example the following shapes all work fine:

(1, 4, 5)
(5, 4, 1)
(2, 3, 4)

but the following examples with a singleton in one of the center positions also has the same error

(2, 3, 1, 5)
(2, 1, 3, 5)

The failure is not specific to subtraction, but also occurs for other operations (+, / , *)

Issue Analytics

State:
Created a year ago
Reactions:2
Comments:7 (7 by maintainers)

Top GitHub Comments

3reactions

grlee77commented, Jul 28, 2022

Ah, I see that the install docs don’t yet list 1.6.0 as supported. I had that version because I am using recent Ubuntu 20.04 packages.

I just tried with a pip install of cupy 11.x and cutensor 1.5.0 obtained via

python -m cupyx.tools.install_library --cuda 11.x --library cutensor

and do not see the issue there.

2reactions

takagicommented, Oct 14, 2022

closed https://docs.nvidia.com/cuda/cutensor/release_notes.html#cutensor-v1-6-1

Top Results From Across the Web

cuTENSOR Functions

descA – [in] A descriptor that holds the information about the data type, modes, and strides of A. modeA – [in] Array (in...

Remove singleton dimensions from multidimensional signal

A singleton dimension is any dimension whose size is one. The Squeeze block operates only on signals whose number of dimensions is greater...

Github Com Cupy Cupy Issues 4902

cuTENSOR 1.6 : failure in math operations on some arrays with singleton. I observed one test failure in cuCIM today when running the...

v9.2.0 PDF

If you are using certain versions of conda, it may fail to build CuPy with ... CuPy is a GPU array backend that...

LmodWeb - HPC High Performance Computing: Home

Skylake. Skylake (SKL) is Intel's microarchitecture that was launched in August 2015 succeeding the Broadwell. Distributions: (1) Rocky/8.4. Packages: 324.