Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

bug in sparse matrix conversion COO->CSR for some matrices

See original GitHub issue

I am finding that conversion of COO->CSR or CSC does not agree with scipy when the underlying matrix is large (nnz > 50k elements or so). The problem does not seem to occur for small matrices such as those used in the test suite.

Conditions (you can just paste the output of python -c 'import cupy; cupy.show_config()')

CuPy Version : 6.0.0b2 CUDA Root : /usr/local/cuda CUDA Build Version : 10000 CUDA Driver Version : 10000 CUDA Runtime Version : 10000

Code to reproduce

The data for the example below is ~1MB in size and is available via: https://drive.google.com/open?id=1BRaTNQoAYJPOjfGyio51OwwmCZdh3vdT

import scipy
import numpy as np
import cupy
import cupyx

data = np.load('/tmp/coo_to_csc_example_100k.npz')

sl = slice(None)
A_cpu = scipy.sparse.coo_matrix((data['data'][sl],
                                 (data['row'][sl], data['col'][sl])))

A_gpu = cupyx.scipy.sparse.coo_matrix(A_cpu)

# this comparison is okay
cupy.testing.assert_allclose(A_gpu.data, A_cpu.data)

# conversion on the GPU does not match for large matrices
A_csr_gpu = A_gpu.tocsr()
A_csr_cpu = A_cpu.tocsr()
cupy.testing.assert_allclose(A_csr_gpu.data,
                             A_csr_cpu.data)

# convert via round-trip to CPU works fine
A_csr_gpu2 = cupyx.scipy.sparse.coo_matrix(A_gpu.get().tocsr())
cupy.testing.assert_allclose(A_csr_gpu2.data,
                             A_csr_cpu.data)

And oddly, the data array after conversion does match at the beginning and then has huge mismatches in many of the later elements. I plot this below:


import matplotlib.pyplot as plt
plt.figure()
plt.plot(A_csc_cpu.data - A_csc_gpu.data.get())

figure_12

The input data has values in the range [0, 1.0]. In this example the data is dtype np.float32, but the same thing occurs for float64 as well.

Issue Analytics

State:
Created 5 years ago
Reactions:1
Comments:7 (5 by maintainers)

Top GitHub Comments

1reaction

econtalcommented, Aug 16, 2019

I believe the issue is that x.data is used both as input and output of CuSPARSE’s gthr here: https://github.com/cupy/cupy/blob/d0dd06d5145f73da4c56d8678ddf64cea702106a/cupy/cusparse.py#L583

This can be fixed by creating another buffer. I will submit a PR!

1reaction

grlee77commented, Feb 6, 2019

Many functions such as __mul__ first convert to CSR, so this is pretty problematic.

The only workaround I found so far is to transfer to the host, convert via scipy’s tocsr() and then transfer back to the GPU, but that is obviously not ideal.

Top Results From Across the Web

Error creating very large sparse matrix from sub-blocks using ...

The problem appears to occur when the combined matrix is converted into coo format. I believe the problem has something to do with...

MatrixExtra: Extra Methods for Sparse Matrices

Implements some mathematical operators between sparse-sparse and sparse-dense matrices and vectors, such as 'CSR + CSR', 'CSR + COO', 'CSR * ...

Sparse to full matrix conversion not working... - MathWorks

seems unable to detetct sparse matrices correctly, and full(.) fails to convert sparse matrices to full matrices. For example, when I run the...

cuSPARSE - NVIDIA Documentation Center

Conversion : operations that allow conversion between different matrix formats, and compression of csr matrices. The cuSPARSE library allows developers to ...

torch.sparse — PyTorch 1.13 documentation

Now, some users might decide to represent data such as graph adjacency matrices, pruned weights or points clouds by Tensors whose elements are...