question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Very slow product for sparse matrices

See original GitHub issue

Description

Matrix product seems very slow for sparse matrices. I’m using 2 sparse matrices with 99.9% and 95% null entries. I compute the product between the matrices as follows:

As the matrices are highly sparse, I’d expect the sparse product to be faster, and definitely not ~500 times~ slower than the corresponding dense product. I’ve tried with cupy installed from wheel, and then from source. I’ve used both cuSPARSELt 0.1.0 and 0.2.0. I’ve timed with both timeit and cupy benchmark profiler. The results I get are always the same.

Any hint would be much appreciated! Thanks for your help!

To Reproduce

import cupy
from sklearn.datasets import fetch_20newsgroups
from sklearn.preprocessing import OneHotEncoder
from sklearn.feature_extraction.text import TfidfVectorizer


news = fetch_20newsgroups(subset='train')

x = TfidfVectorizer().fit_transform(news.data)
y = OneHotEncoder().fit_transform(news.target.reshape(-1, 1)).T

x_sparse = cupy.sparse.csr_matrix(x)
y_sparse = cupy.sparse.csr_matrix(y)
    
x_dense = x_sparse.todense()
y_dense = y_sparse.todense()

n_repeat = 10

def prod(x, y):
    y @ x

from cupyx.profiler import benchmark
print(benchmark(prod, (x_sparse, y_sparse), n_repeat=n_repeat))
print(benchmark(prod, (x_dense, y_dense), n_repeat=n_repeat))

from timeit import timeit
print(timeit(lambda: prod(x_sparse, y_sparse), number=n_repeat) / n_repeat)
print(timeit(lambda: prod(x_dense, y_dense), number=n_repeat) / n_repeat)

Installation

Source (pip install cupy)

Environment

OS                           : Linux-3.10.0-1160.53.1.el7.x86_64-x86_64-with-glibc2.17
Python Version               : 3.8.12
CuPy Version                 : 10.1.0
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 1.21.2
SciPy Version                : 1.6.2
Cython Build Version         : 0.29.27
Cython Runtime Version       : 0.29.27
CUDA Root                    : /usr/local/cuda
nvcc PATH                    : /usr/local/cuda/bin/nvcc --compiler-bindir gcc
CUDA Build Version           : 11050
CUDA Driver Version          : 11050
CUDA Runtime Version         : 11050
cuBLAS Version               : (available)
cuFFT Version                : 10600
cuRAND Version               : 10207
cuSOLVER Version             : (11, 3, 2)
cuSPARSE Version             : (available)
NVRTC Version                : (11, 5)
Thrust Version               : 101301
CUB Build Version            : 101301
Jitify Build Version         : <unknown>
cuDNN Build Version          : 8302
cuDNN Version                : 8302
NCCL Build Version           : 21104
NCCL Runtime Version         : 21104
cuTENSOR Version             : 10400
cuSPARSELt Build Version     : 100
Device 0 Name                : NVIDIA A100-SXM4-40GB
Device 0 Compute Capability  : 80
Device 0 PCI Bus ID          : 0000:00:04.0

Additional Information

No response

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
anarusecommented, Feb 23, 2022

Hi @eguidotti

I ran your reproducer on a V100 GPU and got the following results.

prod : CPU:14013.300 us +/- 9.235 (min:13997.990 / max:14027.839) us GPU-0:40940.442 us +/-12.410 (min:40923.138 / max:40967.167) us prod : CPU: 33.642 us +/- 3.235 (min: 31.219 / max: 42.260) us GPU-0:30652.928 us +/- 5.538 (min:30643.200 / max:30659.584) us 0.03822545510047348 2.949680056190118e-05

It looks like you compared CPU time, but in this case I think it is better to compare GPU time. In GPU time, it is 409 ms for sparse matrices and 306 ms for dense matrices. It is still slower to prod with sparse matrices, though.

1reaction
emcastillocommented, Feb 15, 2022

What are the sizes of the matrices? If the number of non-null entries is not big enough, you will not see any benefit.

Sorry, I did not realize that you were using a publicly available dataset, I will try to profile this and get back to you.

Read more comments on GitHub >

github_iconTop Results From Across the Web

SparseMatrix is very slow - General Usage - Julia Discourse
I tried to conver sparse matrix c to dense, and then do the element-wise product, and that was much faster although it also...
Read more >
Sparse matrix multiplication is too slow · Issue #16187 - GitHub
I'm comparing it to SciPy and it is clearly too slow. It can be 100 times slower on CPU, which makes it quite...
Read more >
Constructing sparse matrix is very slow in Julia - Stack Overflow
I am trying to create a sparse matrix in Julia using theses variables: row, col and val. The size of each is 73141861...
Read more >
Sparse matrix-vector product is slow - KDE-Forum
However perfomance of sparse matrix sparse vector product is at least 10 times slower that of direct matrix-matrix product.
Read more >
Adding three integer sparse matrices is very slow. Adding only ...
Adding more than two sparse matrices in one step in Mathematica 9 is very slow (in fact I couldn't even wait for it...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found