question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

sgesvd_bufferSize int32 overflow with CUDA 10.1

See original GitHub issue

After upgrading from CUDA 9.0 to CUDA 10.1, I noticed I’m not able to compute svd on big matrices because of a int32 overflow in sgesvd_bufferSize here: https://github.com/cupy/cupy/blob/master/cupy/linalg/decomposition.py#L257 When bufferSize is just above 2**31 = 2147483648, then sgesvd_bufferSize fails with CUSOLVER_STATUS_INVALID_VALUE, likely because of a wrong cast to negative values. If you continue increasing, you can get positive values again, but wrong ones. See graph below.

This might be an issue with cuSOLVER itself, not cupy, but since I’m not familiar with testing CUDA without cupy I can’t tell. It might be related to #1365 as well, but I haven’t tested on CUDA 9.1 nor 10.0.

import numpy
from cupy.cuda import cusolver
from cupy.cuda import device

handle = device.get_cusolver_handle()

def test(m):
  try:
    return cusolver.sgesvd_bufferSize(handle, m, 1)
  except:
    return numpy.nan
values = [(m, test(m)) for m in numpy.linspace(1, 150000, 200).astype('int')]    

sgesvd_buffersize_overflow

CuPy Version          : 6.2.0
CUDA Build Version    : 10010

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:11 (9 by maintainers)

github_iconTop GitHub Comments

4reactions
anarusecommented, Nov 20, 2020

This issue should have been fixed in CUDA 11.0.

4reactions
anarusecommented, Aug 8, 2019

Thanks for your information again. This issue is recognized by the library team now. This issue (requiring much more work space) seems to be a side effect of performance improvement in CUDA 10.1 from old version.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What is my Cuda version and how do I install a specific one?
I have tried to locate my previous cuda (10.1) using locate cuda. but it has not helped as I only found the version...
Read more >
Integer arithmetic overflow - CUDA - NVIDIA Developer Forums
On recent GPUs (Kepler and newer) using recent versions of CUDA (8.x, 9.x, 10.x) the language you have within a single CUDA thread...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found