question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Allowing setitem-like operation on dask array

See original GitHub issue

Even though stack and concatenate are nice for combining arrays, sometimes they don’t fit the data I have or require a significant amount of work to use. For instance, combining blocks of different data. In cases like these, it would be nice to be able to use array assignment. While it is true that dask creates graphs of pure operations (with few exceptions) and assignment is unpure, one could imagine creating an array-like object that translates assignments into slicing and stacking/concatenating. This would allow a user to make use of a __setitem__-like syntax, but result in creating a new dask array (or potentially modifying the graph of the existing one) so the net result behaves like assignment while remaining pure.

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Reactions:1
  • Comments:21 (18 by maintainers)

github_iconTop GitHub Comments

2reactions
davidhassellcommented, Apr 13, 2021

Thanks, @jakirkham.

This is largely supported after PR ( 7033 )

Just for the record, in case it’s useful to those who come across this issue in the future, the PR that finally supported this turned out to be #7393, after flaws in 7033 were uncovered.

2reactions
shoyercommented, Jul 2, 2020

There are at least two parts to this issue:

  1. Modifying dask arrays in-place
  2. “Scatter” type operations that perform the NumPy equivalent of z = x.copy(); z[i] = y; return z

Part (1) is arguably the most problematic for dask, because array properties like chunks are expected to be immutable.

Part (2) is the functionality we really need, regardless of how it’s spelled. JAX uses the notation z = x.at[i].set(y).

I believe it could be significantly easier to implement (2) in dask without the baggage of mutable __setitem__ syntax, e.g., so we can feel free to change chunk sizes as appropriate.

It is of course always possible to translate __setitem__ into __getitem__ in user code, but there are a number of cases where this syntax is much more natural. Notable examples include:

Read more comments on GitHub >

github_iconTop Results From Across the Web

Array - Dask documentation
Dask Array implements a subset of the NumPy ndarray interface using blocked algorithms, cutting up the large array into many small arrays.
Read more >
Create Dask Arrays - Dask documentation
Dask array operations will automatically convert NumPy arrays into single-chunk dask ... This allows us to build a variety of custom behaviors that...
Read more >
Best Practices - Dask documentation
It is easy to get started with Dask arrays, but using them well does require some experience. This page contains suggestions for best...
Read more >
Slicing - Dask documentation
Slicing¶. Dask Array supports most of the NumPy slicing syntax. In particular, it supports the following: Slicing by integers and slices: x[0, :5]....
Read more >
Chunks - Dask documentation
Operations like the above result in arrays with unknown shapes and unknown chunk sizes. ... Using compute_chunk_sizes() allows this example run: > ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found