question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Combining dask with sparse arrays

See original GitHub issue

Hi @hameerabbasi,

This is more of a discussion instead of an issue, so if this isn’t the right place for it let me know. I thought it might still be valuable to raise. Recently, I’ve been thinking about how to do distributed skeleton analysis with skan and Dask. Skan uses the sparse library under the hood.

Some things we considered that don’t work so well:

  • First, we thought about processing on the input data bit by bit, and combining many sparse arrays into a bigger one at the end. But this didn’t seem like a good idea - indptr scales linearly with the size of the largest index, regardless of how few non-zero values are contained in the array.

  • Next, we thought about trying to pass output from dask directly into the construction of the sparse array (eg: sparse.coo_matrix((data, (row, col)))). However, this isn’t currently possible to do. At line 148 of /sparse/coo.py, in the __init__ function, the operator.index function doesn’t know how to handle dask arrays (that’s fair enough).

What we have that kinda works:

  • Currently we rely on the (faulty) assumption that the data values will be sparse/small enough that we can bring the results of the dask computation into memory as a numpy array. Then we create the sparse array from that. If you’re curious, you can take a look at this gist to see (scroll right to the end for the sparse array construction).

Do you happen to have any advice? Is this type of workflow something that fits with the overall project goals of sparse?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
GenevieveBuckleycommented, May 13, 2021

Ok. In the case of an image skeleton to graph adjacency matrix, it might complicate things slightly. Results from a single chunk in the image need to be spread out over more than one chunk in the adjacency matrix. Possibly we’d want to collect the results from all chunks, then re-order them (dask really doesn’t do argsort though) and then put them into the graph adjacency matrix.

This has been a good discussion, thanks!

1reaction
hameerabbasicommented, May 3, 2021

Honestly, this is the first time someone confused the two.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Sparse Arrays - Dask documentation
Dask's Array supports mixing different kinds of in-memory arrays. This relies on the in-memory arrays knowing how to interact with each other when...
Read more >
How to make use of xarray's sparse functionality when ...
After combining/mergin both DataArrays into a Dataset, the second variable xr2 is sparse and filled with NaNs. That is actually great, but each ......
Read more >
Minimal example with sparse arrays · Issue #2562 · dask/dask
I was wondering if there is a minimal example of using sparse arrays with Dask, along the lines of, import dask.array as da...
Read more >
Parallel computing with Dask - Xarray
Dask divides arrays into many small pieces, called chunks, each of which is presumed to be small enough to fit into memory. Unlike...
Read more >
Parallel computing with Dask — xarray 0.14.1 documentation
What is a Dask array?¶ ... Dask divides arrays into many small pieces, called chunks, each of which is presumed to be small...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found