Use of `da.where` before `da.einsum` on a chunked array produces incorrectly sized result
See original GitHub issueWhat happened:
Using dask.array.where prior to dask.array.einsum on some chunked arrays results in an incorrectly sized array. This seems to only occur if the array is chunked in the dimensions that are retained by einsum. Using einsum without first using ‘where’ produces the correct result. Note that this occurs even when ‘where’ does not change any values within the array.
What you expected to happen:
Use of dask.array.where shouldn’t affect the result shape of dask.array.einsum.
Minimal Complete Verifiable Example:
import numpy as np
import dask.array as da
a = da.asarray(
[
[1. , 0. ],
[0. , 1. ],
[0.5, 0.5],
],
chunks=((2,1), (2,))
)
b = da.where(a == -1, a, a)
assert a.chunks == b.chunks
np.testing.assert_array_equal(a, b)
da.einsum('xj,yj->xy', a, a).compute()
array([[1. , 0. , 0.5],
[0. , 1. , 0.5],
[0.5, 0.5, 0.5]])
da.einsum('xj,yj->xy', b, b).compute()
array([[1. , 0. , 1. , 0. ],
[0. , 1. , 0. , 1. ],
[0.5, 0.5, 0.5, 0.5],
[0.5, 0.5, 0.5, 0.5]])
Environment:
- Dask version: 2021.9.0 (pypi)
- Python version: 3.9.6 (conda)
- Operating System: Ubuntu 21.04
- Install method: pip within conda env
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
Optimize tensordot with rechunk · Issue #2225 · dask/dask
Dask.array tensordot operations can be made significantly faster by doing an ... For example, if we have the following block-chunked arrays: ...
Read more >einsum not giving overflow error when applied to int arrays
Define a large int16 : In [322]: y=np.int16(32000). Addition produces a warning:
Read more >A basic introduction to NumPy's einsum - ajcr
How to use einsum. The key is to choose the correct labelling for the axes of the inputs arrays and the array that...
Read more >Fast Evaluation of Finite Element Weak Forms Using ...
element tensors and resulting vectors/matrices are tied to the mathematical problem and ... array sizes and also allows BLAS use, see [28].
Read more >What's New — xarray 0.11.3 documentation
__contains__ (used by Python's in operator) now checks array data, not coordinates. The old resample syntax from before xarray 0.10, e.g., ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

I’m not sure what PR fixed this (there have been a few touching this code recently), but the test example passes on main now:
The issue seems to be resolved, closing.
Just for reference, the fix appears to be in dask
2022.01.0Edit: digging a little deeper it was fixed in #8542 and seems to have been a symptom of duplicate dependency names in
rewrite_blockwise, though I can’t quite grok the details.