Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CSC/ CSR classes

See original GitHub issue

Is your feature request related to a problem? Please describe.

Could explicit CSR and CSC classes be added to this library?

I would love to stop using scipy.sparse.spmatrix classes. Their matrix API is a source of constant frustration. Lack of CSC and CSR arrays with an np.ndarray API has been a barrier to adoption not only of this library, but of xarray and dask for the scanpy/ anndata projects

Describe the solution you’d like

I would like there to be explicit CSR and CSC classes in this package. These should perform at least as well as scipy.sparse.spmatrix counterparts.

While higher dimensional algorithms are developed, performant implementations for these cases can be taken from scipy.sparse or developed from reference implementations. I think this is a low effort/ high pay-off path towards performant implementations for common formats. They wouldn’t be n-dimensional, but there’s a lot of value in being able to start consolidating efforts to this library before an n-dimensional generalization has the features and performance of the common 2d cases.

Describe alternatives you’ve considered

No explicit CSC/ CSR classes. These are subsets of more generic structures so why special case them?

I think this is the wrong approach due to how common these structures are. Even a flexible framework like taco exports these formats (pytaco.csc and pytaco.csr).

We could also consider what code looks like if these classes are not defined. All dispatch here could change from isinstance(x, sparse.csr_matrix) to something like isinstance(x, sparse.GXCS) and x.compressed_axes == (0,) and x.ndims == 2), but something closer to the first case is probably an easier sell.

CSC/ CSR classes as thin wrappers of an n-dimensional format

I think this is a great goal. However, it requires an implementation of that format which is at least as performant in most (probably all) cases as the scipy equivalents for broad adoption. As mentioned above, performant implementations can be (1) sourced from a current dependency (scipy.sparse) or (2) easily adapted from existing reference implementations.

Code specialized for these cases could gradually be replaced with more generic versions as they are implemented. However, performant implementation for these common cases would not be blocked by implementation of n-dimensional algorithms.

Additional context

I could expand on why I think this is the right way to go

I could go on about how important I think it is for there to be CSR and CSC classes in this library, and why it’s important to implement these separately from higher dimensional generalizations. I find that makes this post too long. I’d be happy to add these arguments if asked.

Type hierarchy

I don’t know what the appropriate type hierarchy is here. Should issubclass(CSR, GXCS) work? Or should there be an abstract CompressedDimSparseArray parent class for all of these? What is the result type of np.stack(: list[CSR]).

These are things that could be decided on later, once the n-dimensional compressed representation is public API.

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

hammercommented, Sep 20, 2021

With #442 merged, should this issue be closed, or is there work remaining? (Update: I found https://github.com/pydata/sparse/issues/443, I’ll leave a link here in this comment in case others are as dense as me.)

1reaction

ivirshupcommented, Mar 15, 2021

What would be a good first step here?

I imagine getting the classes to a point where individual features can be developed in separate PRs would be important, so having a stable class structure and the basic parameters down would be valuable.

Top Results From Across the Web

Compressed Sparse Column Format (CSC)

Compressed Sparse Column Format (CSC)¶ · three NumPy arrays: indices , indptr , data. indices is array of row indices; data is array...

Introducing MatrixExtra - The Comprehensive R Archive Network

MatrixExtra is a package which extends the same classes from Matrix for COO, CSR, CSC, and sparse vectors, by providing optimized replacements for...

scipy.sparse.csc_matrix — SciPy v1.9.3 Manual

Sparse matrices can be used in arithmetic operations: they support addition, subtraction, multiplication, division, and matrix power. Advantages of the CSC ...

Sparse matrix - Wikipedia

... 1.2 List of lists (LIL); 1.3 Coordinate list (COO); 1.4 Compressed sparse row (CSR, CRS or Yale format); 1.5 Compressed sparse column...

Sparse Matrix Representations | Set 3 ( CSR ) - GeeksforGeeks

Similar to CSR there exits CSC which stands for Compressed Sparse Columns. It is the column analogue for CSR. The 'New' Yale format...