Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

possible bug in cupyx sparse matrix transpose/multiplication

See original GitHub issue

Hello cupy(x) developers,

I would like to report what looks like a bug in the cupyx sparse matrix library. I am running on a Skylake cpu and a Tesla V100 gpu on a SUSE-ish Linux system. CuPy version: 6.0.0. CUDA version 10.1.168. I am running inside a conda environment which you can find in my requirements.txt file:

requirements.txt

Here is a reproducer in which I am trying to compare the same operations on the cpu and gpu and checking at every step that they are the same. Finally in the last step the np.allclose() check fails. The iCov_max_diff value is much larger than machine precision which suggests there is a bug.

import numpy as np
import scipy.sparse
import cupy as cp
import cupyx as cpx

m = n = 1000

np.random.seed(1)

#create random data to use in W
random_data = np.random.random(m)

#create A random sparse matrix
A_cpu = scipy.sparse.random(m, n, format='csr', random_state=42)
A_gpu = cpx.scipy.sparse.csr_matrix(A_cpu)
#yank gpu back and compare
A_yank = A_gpu.get()
assert np.allclose(A_cpu.todense(),A_yank.todense())

#create W random sparse matrix
W_cpu = scipy.sparse.spdiags(data=random_data, diags=[0,], m=m, n=n)
W_gpu = cpx.scipy.sparse.spdiags(data=random_data, diags=[0,], m=m, n=n)
#yank gpu back and compare
W_yank = W_gpu.get()
assert np.allclose(W_cpu.todense(),W_yank.todense())

#see how the dot products go
W_dot_A_cpu = W_cpu.dot(A_cpu)
W_dot_A_gpu = W_gpu.dot(A_gpu)
#yank gpu back and compare
W_dot_A_yank = W_dot_A_gpu.get()
assert np.allclose(W_dot_A_cpu.todense(), W_dot_A_yank.todense())

#check the transpose
A_trans_cpu = A_cpu.T
A_trans_gpu = A_gpu.T
#yank gpu back and compare
A_trans_yank = A_trans_gpu.get() #use get bc its a sparse object
assert np.allclose(A_trans_cpu.todense(), A_trans_yank.todense())

#okay now inverse covariance (where things go wrong)
iCov_cpu = A_cpu.T.dot(W_dot_A_cpu)
iCov_gpu = A_gpu.T.dot(W_dot_A_gpu)
#yank gpu back and compare
iCov_yank = iCov_gpu.get()

iCov_diff = iCov_cpu.todense() - iCov_yank.todense()
max_iCov_diff = np.max(iCov_diff)
print("max iCov diff")
print(max_iCov_diff)

assert np.allclose(iCov_cpu.todense(), iCov_yank.todense()) #fails for large matrix sizes

If there is more information I can provide please let me know. Of course if I am doing something wrong I would also be happy for your feedback about how to do this correctly.

Thank you very much for your help, Laurie

Issue Analytics

State:
Created 4 years ago
Comments:9 (2 by maintainers)

Top GitHub Comments

4reactions

smarkesinicommented, Sep 17, 2019

Hi @econtal, I think so, following the definition in there coo2csr, this function converts the array containing the uncompressed row indices (corresponding to COO format) into an array of compressed row pointers (corresponding to CSR format). In the cuda definition: “Sparse matrices in CSR format are assumed to be stored in row-major CSR format, in other words, the index arrays are first sorted by row indices and then within the same row by column indices. It is assumed that each pair of row and column indices appears only once.”

Anyway, I made some tests using the code attached generating a random (1M by 1M) matrix with 0.1G non-zeros with duplicates, computing the same thing in different ways, for one thing the results are within numerical precision with the CPU implementation and each other, but the timing is also interesting: -without the modification time cupy left multiply (1st time): 3.2422685334458947 -with the modification time cupy left multiply (1st time): 0.013027168810367584

besides, performing the same computation, either as (S*x) (left-multiply) or (xT*ST).T (right multiply), where S is in csr and ST in csc, on GPU and CPU, can change speed quite dramatically: -time cupy rm (1st and 2nd time): 3.269, 3.2578 -time cupy lm (1st and 2nd time): 3.24, 0.00015796 with modified code: -time cupy lm (1st and 2nd time): 0.01377, 0.0001685973

for reference (single threaded scipy), -time scipy lm: 0.2227 (rm) 0.22039

sparse_test.py.txt

2reactions

MQQcommented, Aug 20, 2019

@econtal I manually applied your patch on cupy 6.2.0 with cuda 10.0, and the dot product result looks good. Thanks a lot for fixing this!

Personally, I would like to suggest explicitly add a test case for the transposed dot product. But I am not really familiar with the cupy test setup, so I will leave it to the cupy maintainers to decide.

Top Results From Across the Web

How do I do this calculation most efficiently with scipy sparse ...

Suppose I have the following: A complex-valued sparse matrix Q (and its conjugate transpose is Q^H); A complex-valued diagonal matrix D ...

cupyx.scipy.sparse.spmatrix — CuPy 11.3.0 documentation

Point-wise multiplication by another matrix. power(n, dtype=None)[source]# ... Gives a new shape to a sparse matrix without changing its data. Parameters.

CuPy Documentation - Read the Docs

CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python. CuPy acts as a drop-in.

cupy/community - Gitter

I created a spec file to specify the added data, pathes and hiddenimports. My program are fully working under Python complier. D:\CellCounter\1.1AIsoftware> ...

Sparse and Dense Matrix Classes and Methods - R Project

t signature(x = "dgTMatrix"): returns the transpose of x. Note. Triplet matrices are a convenient form in which to construct sparse matrices ......