Allowing setitem-like operation on dask array
See original GitHub issueEven though stack
and concatenate
are nice for combining arrays, sometimes they don’t fit the data I have or require a significant amount of work to use. For instance, combining blocks of different data. In cases like these, it would be nice to be able to use array assignment. While it is true that dask
creates graphs of pure operations (with few exceptions) and assignment is unpure, one could imagine creating an array-like object that translates assignments into slicing and stacking/concatenating. This would allow a user to make use of a __setitem__
-like syntax, but result in creating a new dask array (or potentially modifying the graph of the existing one) so the net result behaves like assignment while remaining pure.
Issue Analytics
- State:
- Created 7 years ago
- Reactions:1
- Comments:21 (18 by maintainers)
Top Results From Across the Web
Array - Dask documentation
Dask Array implements a subset of the NumPy ndarray interface using blocked algorithms, cutting up the large array into many small arrays.
Read more >Create Dask Arrays - Dask documentation
Dask array operations will automatically convert NumPy arrays into single-chunk dask ... This allows us to build a variety of custom behaviors that...
Read more >Best Practices - Dask documentation
It is easy to get started with Dask arrays, but using them well does require some experience. This page contains suggestions for best...
Read more >Slicing - Dask documentation
Slicing¶. Dask Array supports most of the NumPy slicing syntax. In particular, it supports the following: Slicing by integers and slices: x[0, :5]....
Read more >Chunks - Dask documentation
Operations like the above result in arrays with unknown shapes and unknown chunk sizes. ... Using compute_chunk_sizes() allows this example run: > ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks, @jakirkham.
Just for the record, in case it’s useful to those who come across this issue in the future, the PR that finally supported this turned out to be #7393, after flaws in 7033 were uncovered.
There are at least two parts to this issue:
z = x.copy(); z[i] = y; return z
Part (1) is arguably the most problematic for dask, because array properties like
chunks
are expected to be immutable.Part (2) is the functionality we really need, regardless of how it’s spelled. JAX uses the notation
z = x.at[i].set(y)
.I believe it could be significantly easier to implement (2) in dask without the baggage of mutable
__setitem__
syntax, e.g., so we can feel free to change chunk sizes as appropriate.It is of course always possible to translate
__setitem__
into__getitem__
in user code, but there are a number of cases where this syntax is much more natural. Notable examples include:x[i] += y
handling repeated indicesi
, which is implemented in NumPy asnp.add.at
).