Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`SuperLU.solve` leaks memory for `trans="T"`

See original GitHub issue

Description

I’m trying to use the sparse linear solver to repeatedly solve a problem where the LHS matrix is constantly changing (and I need both it and its transpose). However, doing this seems to blow up my CUDA memory fairly quickly. If I refactor my code and change splu(A).solve(b, trans="T") into splu(A.T).solve(b) the memory stays constant throughout the program.

To Reproduce


A: cp.sparse.coo_matrix =  ... # some big sparse matrix
b = cp.asarray(torch.zeros(A.shape[0]))
for i in range(2000):
    lu = linalg.splu(A)
    x = lu.solve(b, trans="T")

Observe the memory through nvidia-smi -l until eventually getting the error CuSparseError: CUSPARSE_STATUS_ALLOC_FAILED.

On the other hand, this works fine:


A: cp.sparse.coo_matrix =  ... # some big sparse matrix
b = cp.asarray(torch.zeros(A.shape[0]))
for i in range(2000):
    lu = linalg.splu(A.T)
    x = lu.solve(b)

Installation

Conda-Forge (conda install ...)

Environment

OS                           : Linux-5.4.0-122-generic-x86_64-with-glibc2.31
Python Version               : 3.10.5
CuPy Version                 : 11.0.0
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 1.23.1
SciPy Version                : 1.9.0
Cython Build Version         : 0.29.30
Cython Runtime Version       : None
CUDA Root                    : /home/anadodik/miniconda3/envs/donut
nvcc PATH                    : None
CUDA Build Version           : 11020
CUDA Driver Version          : 11070
CUDA Runtime Version         : 11060
cuBLAS Version               : (available)
cuFFT Version                : 10600
cuRAND Version               : 10209
cuSOLVER Version             : (11, 3, 2)
cuSPARSE Version             : (available)
NVRTC Version                : (11, 6)
Thrust Version               : 101000
CUB Build Version            : 101000
Jitify Build Version         : 3c4a4ba
cuDNN Build Version          : 8401
cuDNN Version                : 8401
NCCL Build Version           : 21212
NCCL Runtime Version         : 21304
cuTENSOR Version             : 10500
cuSPARSELt Build Version     : None
Device 0 Name                : NVIDIA TITAN Xp
Device 0 Compute Capability  : 61
Device 0 PCI Bus ID          : 0000:04:00.0