sgesvd_bufferSize int32 overflow with CUDA 10.1
See original GitHub issueAfter upgrading from CUDA 9.0 to CUDA 10.1, I noticed I’m not able to compute svd on big matrices because of a int32 overflow in sgesvd_bufferSize here: https://github.com/cupy/cupy/blob/master/cupy/linalg/decomposition.py#L257
When bufferSize is just above 2**31 = 2147483648
, then sgesvd_bufferSize
fails with CUSOLVER_STATUS_INVALID_VALUE
, likely because of a wrong cast to negative values. If you continue increasing, you can get positive values again, but wrong ones. See graph below.
This might be an issue with cuSOLVER itself, not cupy, but since I’m not familiar with testing CUDA without cupy I can’t tell. It might be related to #1365 as well, but I haven’t tested on CUDA 9.1 nor 10.0.
import numpy
from cupy.cuda import cusolver
from cupy.cuda import device
handle = device.get_cusolver_handle()
def test(m):
try:
return cusolver.sgesvd_bufferSize(handle, m, 1)
except:
return numpy.nan
values = [(m, test(m)) for m in numpy.linspace(1, 150000, 200).astype('int')]
CuPy Version : 6.2.0
CUDA Build Version : 10010
Issue Analytics
- State:
- Created 4 years ago
- Comments:11 (9 by maintainers)
Top Results From Across the Web
What is my Cuda version and how do I install a specific one?
I have tried to locate my previous cuda (10.1) using locate cuda. but it has not helped as I only found the version...
Read more >Integer arithmetic overflow - CUDA - NVIDIA Developer Forums
On recent GPUs (Kepler and newer) using recent versions of CUDA (8.x, 9.x, 10.x) the language you have within a single CUDA thread...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This issue should have been fixed in CUDA 11.0.
Thanks for your information again. This issue is recognized by the library team now. This issue (requiring much more work space) seems to be a side effect of performance improvement in CUDA 10.1 from old version.