question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

cupy.linalg.svd on 'large matrices' leads to cusolver status invalid value error

See original GitHub issue

Description

Hi all,

cupy linalg svd is giving me a CUSOLVERError: CUSOLVER_STATUS_INVALID_VALUE error for ‘large’ matrices, e.g. of size 100,000 x 100. I didn’t use to have these errors on the previous cupy version I was running. I updated the cupy package some weeks ago.

Weirdly enough, ‘small’ matrices, e.g. 10,000 x 100 give no problems. Running on an NVIDIA A6000, so memory should not be the problem. Couldn’t find any recent bug reporting on cupy linalg svd. Someone an idea on further investigations to find out what the cause might be?

Thanks in advance,

Roger

To Reproduce

import cupy
a = cupy.random.rand(100_000,100)
b = cupy.linalg.svd(a, full_matrices=False, compute_uv=True)

Installation

Conda-Forge (conda install ...)

Environment

OS                           : Linux-5.15.0-48-generic-x86_64-with-glibc2.35
Python Version               : 3.9.13
CuPy Version                 : 11.1.0
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 1.23.1
SciPy Version                : 1.8.0
Cython Build Version         : 0.29.32
Cython Runtime Version       : None
CUDA Root                    : /home/roger/miniconda3
nvcc PATH                    : /home/roger/miniconda3/bin/nvcc
CUDA Build Version           : 10020
CUDA Driver Version          : 11070
CUDA Runtime Version         : 10020
cuBLAS Version               : (available)
cuFFT Version                : 10102
cuRAND Version               : 10102
cuSOLVER Version             : (10, 3, 0)
cuSPARSE Version             : (available)
NVRTC Version                : (10, 2)
Thrust Version               : 100907
CUB Build Version            : <unknown>
Jitify Build Version         : 3ecec55
cuDNN Build Version          : None
cuDNN Version                : None
NCCL Build Version           : None
NCCL Runtime Version         : None
cuTENSOR Version             : None
cuSPARSELt Build Version     : None
Device 0 Name                : NVIDIA RTX A6000
Device 0 Compute Capability  : 86
Device 0 PCI Bus ID          : 0000:73:00.0

Additional Information

The full error output is:

---------------------------------------------------------------------------
CUSOLVERError                             Traceback (most recent call last)
Input In [13], in <cell line: 1>()
----> 1 b = cupy.linalg.svd(a, full_matrices=False, compute_uv=True)

File ~/miniconda3/lib/python3.9/site-packages/cupy/linalg/_decomposition.py:566, in svd(a, full_matrices, compute_uv)
    564     rwork = cupy.empty(min(m, n)-1, dtype=s_dtype)
    565     rwork_ptr = rwork.data.ptr
--> 566 gesvd(
    567     handle, job_u, job_vt, m, n, x.data.ptr, m, s.data.ptr, u_ptr, m,
    568     vt_ptr, n, workspace.data.ptr, buffersize, rwork_ptr,
    569     dev_info.data.ptr)
    570 cupy.linalg._util._check_cusolver_dev_info_if_synchronization_allowed(
    571     gesvd, dev_info)
    573 s = s.astype(s_dtype, copy=False)

File cupy_backends/cuda/libs/cusolver.pyx:2731, in cupy_backends.cuda.libs.cusolver.dgesvd()

File cupy_backends/cuda/libs/cusolver.pyx:2740, in cupy_backends.cuda.libs.cusolver.dgesvd()

File cupy_backends/cuda/libs/cusolver.pyx:1079, in cupy_backends.cuda.libs.cusolver.check_status()

CUSOLVERError: CUSOLVER_STATUS_INVALID_VALUE

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:1
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
takagicommented, Sep 29, 2022

Hi, I reproduced the error on CUDA 10.2 with CuPy 8 and later including CuPy 11. And also checked CUDA 11.7 with CuPy 11 did not. I see you’re on CUDA driver version 11.7. Is it possible for you to update the CUDA Toolkit to the latest one and give it a try?

0reactions
RogerMoenscommented, Oct 3, 2022

@leofang Aah check, that explains a lot.

Thanks for all the help!

Read more comments on GitHub >

github_iconTop Results From Across the Web

About unified memory in Cupy · Issue #3127
This looks like an error in cuSOLVER due to using too large arrays. But I don't think it is related to unified memory....
Read more >
cupy.linalg.svd — CuPy 11.4.0 documentation
This function calls one or more cuSOLVER routine(s) which may yield invalid results if input conditions are not met. To detect these invalid...
Read more >
cuSOLVER API Reference
The first part of cuSolver is called cuSolverDN, and deals with dense matrix factorization and solve routines such as LU, QR, SVD and...
Read more >
CUDA cuSolver gesvdj with large matrix
I'm able to do this SVD calculation in Python with ease even given all the overhead from Python. These arrays really aren't that...
Read more >
CuPy Documentation
Handling extremely large arrays whose size is around 32-bit boundary ... If P2P is unavailable, such an attempt will fail with ValueError.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found