question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`SuperLU.solve` leaks memory for `trans="T"`

See original GitHub issue

Description

I’m trying to use the sparse linear solver to repeatedly solve a problem where the LHS matrix is constantly changing (and I need both it and its transpose). However, doing this seems to blow up my CUDA memory fairly quickly. If I refactor my code and change splu(A).solve(b, trans="T") into splu(A.T).solve(b) the memory stays constant throughout the program.

To Reproduce


A: cp.sparse.coo_matrix =  ... # some big sparse matrix
b = cp.asarray(torch.zeros(A.shape[0]))
for i in range(2000):
    lu = linalg.splu(A)
    x = lu.solve(b, trans="T")

Observe the memory through nvidia-smi -l until eventually getting the error CuSparseError: CUSPARSE_STATUS_ALLOC_FAILED.

On the other hand, this works fine:


A: cp.sparse.coo_matrix =  ... # some big sparse matrix
b = cp.asarray(torch.zeros(A.shape[0]))
for i in range(2000):
    lu = linalg.splu(A.T)
    x = lu.solve(b)

Installation

Conda-Forge (conda install ...)

Environment

OS                           : Linux-5.4.0-122-generic-x86_64-with-glibc2.31
Python Version               : 3.10.5
CuPy Version                 : 11.0.0
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 1.23.1
SciPy Version                : 1.9.0
Cython Build Version         : 0.29.30
Cython Runtime Version       : None
CUDA Root                    : /home/anadodik/miniconda3/envs/donut
nvcc PATH                    : None
CUDA Build Version           : 11020
CUDA Driver Version          : 11070
CUDA Runtime Version         : 11060
cuBLAS Version               : (available)
cuFFT Version                : 10600
cuRAND Version               : 10209
cuSOLVER Version             : (11, 3, 2)
cuSPARSE Version             : (available)
NVRTC Version                : (11, 6)
Thrust Version               : 101000
CUB Build Version            : 101000
Jitify Build Version         : 3c4a4ba
cuDNN Build Version          : 8401
cuDNN Version                : 8401
NCCL Build Version           : 21212
NCCL Runtime Version         : 21304
cuTENSOR Version             : 10500
cuSPARSELt Build Version     : None
Device 0 Name                : NVIDIA TITAN Xp
Device 0 Compute Capability  : 61
Device 0 PCI Bus ID          : 0000:04:00.0

Additional Information

No response

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:13 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
emcastillocommented, Aug 7, 2022

seems to be a leak in nvidia libraries and not CuPy

0reactions
leofangcommented, Sep 6, 2022

To be fixed in #7039.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Amesos2: Memory Leak with SuperLU · Issue #5988 - GitHub
When running SPARC with clang's address sanitizer we see reports of memory leaks that point to using SuperLU in amesos2.
Read more >
Evaluation of SuperLU on multicore architectures - IOPscience
In this paper, we study the factorization and triangular solution kernels in the sparse direct solver SuperLU [1] on two leading CMP systems....
Read more >
SuperLU: Home Page - NERSC
SuperLU is a general purpose library for the direct solution of large, ... Fixed memory leaks and a few other bugs in parallel...
Read more >
A comparison of SuperLU solvers on the intel MIC architecture
In this work, sequential, multithreaded and distributed versions of SuperLU solvers (see [2]) are examined on the Intel Xeon Phi coprocessors.
Read more >
GPU Capable Sparse Direct Solvers - YouTube
In this tutorial we illustrate the use of the sparse direct solvers and factorization based preconditioners SuperLU and STRUMPACK on modern ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found