question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How should xarray use/support sparse arrays?

See original GitHub issue

I’m looking forward to being easily able to create sparse xarray objects from pandas: https://github.com/pydata/xarray/issues/3206

Are there other xarray APIs that could make good use of sparse arrays, or could make sparse arrays easier to use?

Some ideas:

  • to_sparse()/to_dense() methods for converting to/from sparse without requiring using .data
  • to_dataframe()/to_series() could grow options for skipping the fill-value in sparse arrays, so they can round-trip MultiIndex data back to pandas
  • Serialization to/from netCDF files, using some custom convention (see https://github.com/pydata/xarray/issues/1375#issuecomment-402699810)

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:12
  • Comments:47 (18 by maintainers)

github_iconTop GitHub Comments

2reactions
Material-Scientistcommented, Jan 16, 2022

I would prefer to retain the dense representation, but with tricks to keep the data of sparse type in memory.

Look at the following example with pandas multiindex & sparse dtype: image

The dense data uses ~40 MB of memory, while the dense representation with sparse dtypes uses only ~0.5 kB of memory!

And while you can import dataframes with the sparse=True keyword, the size seems to be displayed inaccurately (both are the same size?), and we cannot examine the data like we can with pandas multiindex + sparse dtype: image

Besides, a lot of operations are not available on sparse xarray data variables (i.e. if I wanted to group by price level for ffill & downsampling): image

So, it would be nice if xarray adopted pandas’ approach of unstacking sparse data.

In the end, you could extract all the non-NaN values and write them to a sparse storage format, such as TileDB sparse arrays. cc: @stavrospapadopoulos

2reactions
fjanooscommented, Aug 30, 2019

Would it be possible that pd.{Series, DataFrame}.to_xarray() automatically creates a sparse dataarray - or we have a flag in to_xarray which allows controlling for this. I have a very sparse dataframe and everytime I try to convert it to xarray I blow out my memory. Keeping it sparse but logically as a DataArray would be fantastic.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Sparse arrays and the CESM land model component
Usually we work with Xarray wrapping a Dask array which in turn uses NumPy arrays for each block; or just Xarray wrapping NumPy...
Read more >
How to make use of xarray's sparse functionality when ...
Either: arr_a = arr_a.map_blocks(sparse.COO) arr_b = arr_b.map_blocks(sparse.COO). Or: xr1 = xarray.apply_ufunc(sparse.
Read more >
xarray.DataArray.from_series
If the series's index is a MultiIndex, it will be expanded into a tensor product ... If sparse=True, creates a sparse array instead...
Read more >
Construct Sparse Arrays - PyData/Sparse
You can construct COO arrays from coordinates and value data. ... Each row of coords contains one dimension of the desired sparse array,...
Read more >
Sparse Arrays - Dask documentation
By swapping out in-memory NumPy arrays with in-memory sparse arrays, we can reuse the blocked algorithms of Dask's Array to achieve parallel and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found