question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hitting a precision limit in `cupy.sparse.csr_matrix` when using `multiply`.

See original GitHub issue

I’ve come across a bug in cupy.sparse.csr_matrix which seems to happen only some of the time. I have created a minimally working example below. This example is quite contrived in order to get the error message to happen every time, and the matrix is actually fully sparse. I am coming across the same bug in my research but it doesn’t happen consistently, and in those cases I am using a sparse matrix.

In the MWE we create a huge dense matrix, and multiply it by a vector. The multiply method is actually turning the csr_matrix into a coo_matrix. This happens fine for the first iteration, but on a second iteration this breaks with the following error message.

It seems that the col array hits the precision limit for an int32 and then contains values that are extremely large and negative, or extremely large, and this causes the error message. However, the array is not large enough that it should hit that precision limit.

  • Conditions (you can just paste the output of python -c 'import cupy; cupy.show_config()')
OS                           : Linux-4.12.14-122.80.1.20210720-nasa-x86_64-with-glibc2.10
Python Version               : 3.8.5
CuPy Version                 : 9.4.0
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 1.19.2
SciPy Version                : 1.7.1
Cython Build Version         : 0.29.24
Cython Runtime Version       : None
CUDA Root                    : /home4/jimartin/.conda/envs/gpu_hack
nvcc PATH                    : None
CUDA Build Version           : 11000
CUDA Driver Version          : 11000
CUDA Runtime Version         : 11000
cuBLAS Version               : (available)
cuFFT Version                : 10201
cuRAND Version               : 10201
cuSOLVER Version             : (10, 6, 0)
cuSPARSE Version             : (available)
NVRTC Version                : (11, 0)
Thrust Version               : 100909
CUB Build Version            : 100909
Jitify Build Version         : <unknown>
cuDNN Build Version          : None
cuDNN Version                : None
NCCL Build Version           : None
NCCL Runtime Version         : None
cuTENSOR Version             : None
cuSPARSELt Build Version     : None
Device 0 Name                : Tesla V100-SXM2-32GB
Device 0 Compute Capability  : 70
Device 0 PCI Bus ID         : 0000:16:00.0
  • Code to reproduce
from cupy import sparse
import cupy as cp

def run():
    A = sparse.csr_matrix(cp.ones((2048**2, 25)))
    return A.multiply(cp.random.normal(size=2048**2)[:, None])
run()
run()
  • Error messages, stack traces, or logs
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-ec9775ede022> in <module>
----> 1 run()

<ipython-input-1-1d6522e0f920> in run()
      4 def run():
      5     A = sparse.csr_matrix(cp.ones((2048**2, 25)))
----> 6     return A.multiply(cp.random.normal(size=2048**2)[:, None])

~/.conda/envs/gpu_hack/lib/python3.8/site-packages/cupyx/scipy/sparse/csr.py in multiply(self, other)
    321             return multiply_by_scalar(self, other)
    322         elif _util.isdense(other):
--> 323             self.sum_duplicates()
    324             other = cupy.atleast_2d(other)
    325             return multiply_by_dense(self, other)

~/.conda/envs/gpu_hack/lib/python3.8/site-packages/cupyx/scipy/sparse/compressed.py in sum_duplicates(self)
    776         # TODO(leofang): add a kernel for compressed sparse matrices without
    777         # converting to coo
--> 778         coo = self.tocoo()
    779         coo.sum_duplicates()
    780         self.__init__(coo.asformat(self.format))

~/.conda/envs/gpu_hack/lib/python3.8/site-packages/cupyx/scipy/sparse/csr.py in tocoo(self, copy)
    436             indices = self.indices
    437 
--> 438         return cusparse.csr2coo(self, data, indices)
    439 
    440     def tocsc(self, copy=False):

~/.conda/envs/gpu_hack/lib/python3.8/site-packages/cupy/cusparse.py in csr2coo(x, data, indices)
    928         _cusparse.CUSPARSE_INDEX_BASE_ZERO)
    929     # data and indices did not need to be copied already
--> 930     return cupyx.scipy.sparse.coo_matrix(
    931         (data, (row, indices)), shape=x.shape)
    932 

~/.conda/envs/gpu_hack/lib/python3.8/site-packages/cupyx/scipy/sparse/coo.py in __init__(self, arg1, shape, dtype, copy)
    157                 raise ValueError('row index exceeds matrix dimensions')
    158             if col.max() >= shape[1]:
--> 159                 raise ValueError('column index exceeds matrix dimensions')
    160             if row.min() < 0:
    161                 raise ValueError('negative row index found')

ValueError: column index exceeds matrix dimensions

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
dmargalacommented, Dec 10, 2021

@christinahedges I tested the example code on the Perlmutter system at NERSC using the cudatoolkit modules available on that system. I can confirm this fails in 11.0 and seems to work starting in 11.2 (there does not seem to be a cudatoolkit module compatible with 11.1).

cudatoolkit module cupy package result
cudatoolkit/21.3_11.0 cupy-cuda110==9.4.0 fail
cudatoolkit/21.3_11.2 cupy-cuda112==9.4.0 pass
cudatoolkit/21.9_11.0 cupy-cuda110==9.4.0 fail
cudatoolkit/21.9_11.4 cupy-cuda112==9.4.0 pass
cudatoolkit/21.9_11.4 cupy-cuda113==9.4.0 pass
cudatoolkit/21.9_11.4 cupy-cuda114==9.4.0 pass
1reaction
takagicommented, Nov 26, 2021

I’ve confirmed that cupy-cuda110==9.4.0 and cupy-cuda111==9.4.0 reproduce the error, while cupy-cuda112==9.4.0 doesn’t.

Read more comments on GitHub >

github_iconTop Results From Across the Web

cupyx.scipy.sparse.csr_matrix — CuPy 11.4.0 documentation
Compressed Sparse Row matrix. This can be instantiated in several ways. ... Returns indices of maximum elements along an axis. Implicit zero elements...
Read more >
Sparse matrices (cupyx.scipy.sparse) — CuPy 11.4.0 ...
To convert CuPy sparse matrices to CuPy ndarray, use toarray of each CuPy sparse matrix instance (e.g., cupyx.scipy.sparse.csr_matrix.toarray() ).
Read more >
cupy.sparse.csr_matrix — CuPy 2.5.0 documentation
Parameters: dtype – Type specifier. Returns: A copy of the array with a given type. ceil () ...
Read more >
cupyx.scipy.sparse.spmatrix — CuPy 11.3.0 documentation
A copy of the array with the given type and the same format. Return type. cupyx.scipy.sparse.spmatrix ... Point-wise multiplication by another matrix.
Read more >
CuPy Documentation - Read the Docs
CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python. CuPy acts as a drop-in.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found