question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

bug in sparse matrix conversion COO->CSR for some matrices

See original GitHub issue

I am finding that conversion of COO->CSR or CSC does not agree with scipy when the underlying matrix is large (nnz > 50k elements or so). The problem does not seem to occur for small matrices such as those used in the test suite.

  • Conditions (you can just paste the output of python -c 'import cupy; cupy.show_config()')

CuPy Version : 6.0.0b2 CUDA Root : /usr/local/cuda CUDA Build Version : 10000 CUDA Driver Version : 10000 CUDA Runtime Version : 10000

  • Code to reproduce

The data for the example below is ~1MB in size and is available via: https://drive.google.com/open?id=1BRaTNQoAYJPOjfGyio51OwwmCZdh3vdT

import scipy
import numpy as np
import cupy
import cupyx

data = np.load('/tmp/coo_to_csc_example_100k.npz')

sl = slice(None)
A_cpu = scipy.sparse.coo_matrix((data['data'][sl],
                                 (data['row'][sl], data['col'][sl])))

A_gpu = cupyx.scipy.sparse.coo_matrix(A_cpu)

# this comparison is okay
cupy.testing.assert_allclose(A_gpu.data, A_cpu.data)

# conversion on the GPU does not match for large matrices
A_csr_gpu = A_gpu.tocsr()
A_csr_cpu = A_cpu.tocsr()
cupy.testing.assert_allclose(A_csr_gpu.data,
                             A_csr_cpu.data)

# convert via round-trip to CPU works fine
A_csr_gpu2 = cupyx.scipy.sparse.coo_matrix(A_gpu.get().tocsr())
cupy.testing.assert_allclose(A_csr_gpu2.data,
                             A_csr_cpu.data)

And oddly, the data array after conversion does match at the beginning and then has huge mismatches in many of the later elements. I plot this below:


import matplotlib.pyplot as plt
plt.figure()
plt.plot(A_csc_cpu.data - A_csc_gpu.data.get())

figure_12

The input data has values in the range [0, 1.0]. In this example the data is dtype np.float32, but the same thing occurs for float64 as well.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:1
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
econtalcommented, Aug 16, 2019

I believe the issue is that x.data is used both as input and output of CuSPARSE’s gthr here: https://github.com/cupy/cupy/blob/d0dd06d5145f73da4c56d8678ddf64cea702106a/cupy/cusparse.py#L583

This can be fixed by creating another buffer. I will submit a PR!

1reaction
grlee77commented, Feb 6, 2019

Many functions such as __mul__ first convert to CSR, so this is pretty problematic.

The only workaround I found so far is to transfer to the host, convert via scipy’s tocsr() and then transfer back to the GPU, but that is obviously not ideal.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Error creating very large sparse matrix from sub-blocks using ...
The problem appears to occur when the combined matrix is converted into coo format. I believe the problem has something to do with...
Read more >
MatrixExtra: Extra Methods for Sparse Matrices
Implements some mathematical operators between sparse-sparse and sparse-dense matrices and vectors, such as 'CSR + CSR', 'CSR + COO', 'CSR * ...
Read more >
Sparse to full matrix conversion not working... - MathWorks
seems unable to detetct sparse matrices correctly, and full(.) fails to convert sparse matrices to full matrices. For example, when I run the...
Read more >
cuSPARSE - NVIDIA Documentation Center
Conversion : operations that allow conversion between different matrix formats, and compression of csr matrices. The cuSPARSE library allows developers to ...
Read more >
torch.sparse — PyTorch 1.13 documentation
Now, some users might decide to represent data such as graph adjacency matrices, pruned weights or points clouds by Tensors whose elements are...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found