question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Multidimensional dask coordinates unexpectedly computed

See original GitHub issue

MCVE Code Sample

from dask.diagnostics import ProgressBar
import xarray as xr
import numpy as np
import dask.array as da

a = xr.DataArray(da.zeros((10, 10), chunks=2), dims=('y', 'x'), coords={'y': np.arange(10), 'x': np.arange(10), 'lons': (('y', 'x'), da.zeros((10, 10), chunks=2))}) 
b = xr.DataArray(da.zeros((10, 10), chunks=2), dims=('y', 'x'), coords={'y': np.arange(10), 'x': np.arange(10), 'lons': (('y', 'x'), da.zeros((10, 10), chunks=2))}) 

with ProgressBar():
    c = a + b

Output:

[########################################] | 100% Completed |  0.1s

Problem Description

Using arrays with 2D dask array coordinates results in the coordinates being computed for any binary operations (anything combining two or more DataArrays). I use ProgressBar in the above example to show when coordinates are being computed.

In my own work, when I learned that 2D dask coordinates were possible, I started adding longitude and latitude coordinates. These are rather large and can take a while to load/compute so I was surprised that simple operations (ex. a.fillna(b)) were causing things to be computed and taking a long time.

Is this computation by design or a possible bug?

Expected Output

No output from the ProgressBar, hoping that no coordinates would be computed/loaded.

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None python: 3.6.7 | packaged by conda-forge | (default, Feb 28 2019, 02:16:08) [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2

xarray: 0.12.1 pandas: 0.24.2 numpy: 1.14.3 scipy: 1.3.0 netCDF4: 1.5.1.2 pydap: None h5netcdf: 0.7.4 h5py: 2.9.0 Nio: None zarr: 2.3.2 cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: 1.0.22 cfgrib: None iris: None bottleneck: 1.2.1 dask: 2.0.0 distributed: 2.0.0 matplotlib: 3.1.0 cartopy: 0.17.1.dev147+HEAD.detached.at.5e624fe seaborn: None setuptools: 41.0.1 pip: 19.1.1 conda: None pytest: 4.6.3 IPython: 7.5.0 sphinx: 2.1.2

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
dhirschfeldcommented, Jul 1, 2019
0reactions
djhoesecommented, Jul 5, 2019

Ah, good call. The transpose currently in xarray would still be a problem though.

Read more comments on GitHub >

github_iconTop Results From Across the Web

OSError while computing large amount of data with dask
I have a large amount of data (*.grib) that I load using xarray and dask. To make it simple, my data is record...
Read more >
Parallel computing with Dask - Xarray
Dask DataFrames do not support multi-indexes so the coordinate variables from the dataset are included as columns in the Dask DataFrame. Using Dask...
Read more >
API - Dask documentation
Compute the median along the specified axis. meshgrid (*xi[, sparse, indexing]). Return coordinate matrices from coordinate vectors.
Read more >
Debugging dask workflows: Detrending - GitHub Pages
The dask dashboard screenshot for the following cell was captured with dask 2021.12.0. detrended.compute().
Read more >
xray + dask: out-of-core, labeled arrays in Python
Xray provides labeled, multi-dimensional arrays. Dask provides a system for parallel computing. Together, they allow for easy analysis of ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found