Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Array indexing of sparse matrices

See original GitHub issue

After talking to @quasiben I’ve been looking into using CuPy as a shim to support sparse matrices in PyTorch (the current support is not great).

The gist is that I can fit a sparse matrix into GPU memory but not a dense one. So, the idea is to use a CuPy sparse matrix and a custom DataLoader: I should be able to quickly train a model by indexing into the CuPy matrix, densifying a batch of data, and doing a zero-copy conversion to a tensor to feed into my model.

Unfortunately it appears that CuPy’s sparse matrices don’t support array indexing, which would be the natural way to do this (in the sense that PyTorch’s DataLoader class returns a list of integers to index into your data). There’s definitely a way to get this working with single indices and vstack but I thought I’d raise an issue because scipy does support this and it’d be a lot cleaner for me.

Thanks!

Conditions (you can just paste the output of python -c 'import cupy; cupy.show_config()') CuPy Version : 6.2.0 CUDA Root : /usr/local/cuda CUDA Build Version : 9000 CUDA Driver Version : 9000 CUDA Runtime Version : 9000 cuDNN Build Version : 7402 cuDNN Version : 7402 NCCL Build Version : 2307 NCCL Runtime Version : 2307
Code to reproduce

scipy_sparse = scipy.sparse.random(100, 100).tocsr()
scipy_sparse[[0,1],:] # this works for csr matrices

cupy_sparse = cupy.sparse.csr_matrix(scipy_sparse)
cupy_sparse[[0,1],:] # error :(

Error messages, stack traces, or logs

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-70-425afa4195a2> in <module>
      3 
      4 cupy_sparse = cupy.sparse.csr_matrix(scipy_sparse)
----> 5 cupy_sparse[[0,1],:]

/opt/conda/lib/python3.7/site-packages/cupyx/scipy/sparse/compressed.py in __getitem__(self, slices)
    247                 return self._get_major_slice(major)
    248 
--> 249         raise ValueError('unsupported indexing')
    250 
    251     def _get_single(self, major, minor):

ValueError: unsupported indexing

Issue Analytics

State:
Created 4 years ago
Comments:10 (5 by maintainers)

Top GitHub Comments

2reactions

quasibencommented, Nov 5, 2020

Thanks for testing this out @jamestwebber ! Now for distributed GPUs 😃

1reaction

jamestwebbercommented, Oct 31, 2020

Thanks for fixing this, I started using it about a week ago. I see about 2.5x speed increase in training my model when I can keep it entirely in GPU memory and convert to dense tensors on demand.

I wrote up my (very simple) solution in a gist in case anyone wants to use it.