Make SparseArray an ExtensionArray
See original GitHub issueWe should make SparseArray a proper ExtensionArray.
It seems like this will be somewhat difficult to do properly when SparseArray subclasses ndarray. Basic things like np.asarray(sparse_array)
don’t match the required ExtensionArray API (https://github.com/pandas-dev/pandas/issues/14167). Fixing this, especially when we subclass ndarray, is going to be difficult. I can’t override the behavior of np.asarray(sparse_array)
in Python.
So, some questions
- Do people rely on SparseArray being an ndarray subclass?
- Do we want to make a clean break, or introduce deprecations for things that will need changing (but with no clear upgrade path)?
My current preference is to just break things, but I don’t use sparse. SparseArray would compose an ndarray of dense values and a SparseIndex
, but it would no longer subclass ndarray.
CCing some people who seem to use pandas’ sparse: @hexgnu @kernc @Licht-T
Issue Analytics
- State:
- Created 5 years ago
- Comments:16 (16 by maintainers)
Top Results From Across the Web
Sparse data structures — pandas 1.5.2 documentation
SparseArray is a ExtensionArray for storing an array of sparse values (see ... and on the Series class itself for creating a Series...
Read more >how to make a sparse pandas DataFrame from a csv file
I would have guessed there would be a way to load the sparse part of the csv into some form of sparse array,...
Read more >Sparse data structures — pandas 1.0.0rc0+111.ge72cd7c52 ...
SparseArray is a ExtensionArray for storing an array of sparse values (see ... and on the Series class itself for creating a Series...
Read more >Quickly creating a sparse array - Mathematica Stack Exchange
You can use the first documented usage for SparseArray : enter image description here. So what you want to do is collect all...
Read more >array.py - " SparseArray data structure " from collections import abc ...
"""Create a SparseArray from a scipy.sparse matrix... versionadded:: ... Want# ExtensionArray.factorize -> Tuple[EA, EA]# Given that we have to return a ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yeah, they’re not in the official public API, but they were likely not considered in the
pandas.core
privatization shuffle.So, I don’t really know. I suppose it depends on if people find them useful (@hexgnu @kernc @Licht-T), otherwise I would default to making them private implementation details of SparseArray.
That let me wonder: should this be public? Or more in general, are the sparse index objects considered public? (you can pass it in the
SparseSeries
constructor currently, but the objects are never used in the docs / not exposed top-level).