question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Challenges running xarray wrapped netcdf files

See original GitHub issue

This is a traceback from calling compute on an XArray computation on dask.distributed.

We’re able to use dask.array on a NetCDF4 object without locks if our workers have single threads. However, when computing on the .data attribute backed by a NetCDF object wrapped by a few XArray containers we run into the following error. It appears to be coming from computing the shape, which is odd. Traceback below:

cc @mrocklin @shoyer

In [168]: ds = xr.open_mfdataset(fname, lock=False)

In [169]: ds.yParticle.data.sum().compute()
/net/scratch3/pwolfram/miniconda2/lib/python2.7/site-packages/dask/array/core.pyc in getarray()
     47         lock.acquire()
     48     try:
---> 49         c = a[b]
     50         if type(c) != np.ndarray:
     51             c = np.asarray(c)

/users/pwolfram/lib/python2.7/site-packages/xarray/core/indexing.pyc in __getitem__()
    396 
    397     def __getitem__(self, key):
--> 398         return type(self)(self.array, self._updated_key(key))
    399 
    400     def __setitem__(self, key, value):

/users/pwolfram/lib/python2.7/site-packages/xarray/core/indexing.pyc in _updated_key()
    372 
    373     def _updated_key(self, new_key):
--> 374         new_key = iter(canonicalize_indexer(new_key, self.ndim))
    375         key = []
    376         for size, k in zip(self.array.shape, self.key):

/users/pwolfram/lib/python2.7/site-packages/xarray/core/utils.pyc in ndim()
    380     @property
    381     def ndim(self):
--> 382         return len(self.shape)
    383 
    384     @property

/users/pwolfram/lib/python2.7/site-packages/xarray/core/indexing.pyc in shape()
    384     def shape(self):
    385         shape = []
--> 386         for size, k in zip(self.array.shape, self.key):
    387             if isinstance(k, slice):
    388                 shape.append(len(range(*k.indices(size))))

/users/pwolfram/lib/python2.7/site-packages/xarray/conventions.pyc in shape()
    447     @property
    448     def shape(self):
--> 449         return self.array.shape[:-1]
    450 
    451     def __str__(self):

/users/pwolfram/lib/python2.7/site-packages/xarray/core/indexing.pyc in shape()
    384     def shape(self):
    385         shape = []
--> 386         for size, k in zip(self.array.shape, self.key):
    387             if isinstance(k, slice):
    388                 shape.append(len(range(*k.indices(size))))

/users/pwolfram/lib/python2.7/site-packages/xarray/core/utils.pyc in shape()
    407     @property
    408     def shape(self):
--> 409         return self.array.shape
    410 
    411     def __array__(self, dtype=None):

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.shape.__get__ (netCDF4/_netCDF4.c:32778)()

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable._getdims (netCDF4/_netCDF4.c:31870)()

RuntimeError: NetCDF: Not a valid ID

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:48 (20 by maintainers)

github_iconTop GitHub Comments

2reactions
pwolframcommented, Nov 8, 2016

@shoyer and @mrocklin, this looks like it is working now using pydata/xarray#1095:

In [1]: from dask.distributed import Client

In [2]: client = Client('wf609:8786')

In [3]: client
Out[3]: <Client: scheduler="wf609:8786" processes=2 cores=32>

In [5]: import dask.array as da

In [6]: import xarray as xr

In [7]: ds = xr.open_mfdataset('fname', lock=False)

In [8]: x = ds.yParticle.data

In [9]: x.sum().compute()
Out[9]: 31347046718055.527

In [10]: ds = xr.open_mfdataset('./lagrPartTrack.*.nc', lock=False)

In [11]: x = ds.yParticle.data

In [12]: x.sum().compute()
Out[12]: 525875176622133.69

Would this naturally suggest that xarray-distributed is now a reality? If so, I should try something more complex when I get the time tomorrow.

0reactions
edougherty32commented, Jul 28, 2018

Hi, I’m having the same issue in receiving the error message:

RuntimeError: NetCDF: Not a valid ID

When trying to get values from a dask array after performing a computation. Though I see this issue was resolved, using #https://github.com/pydata/xarray/pull/1095, I don’t see the explicit solution.

Could you please redirect me to this solution? Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Reading and writing files - Xarray
NetCDF files are often encountered in collections, e.g., with different files corresponding to different model runs or one file per ...
Read more >
Difficulties using multi-threading and multi-processing and ...
Calculation of a grid cell has no dependency on data/result of another gridcell... we could/should be able to calculate all in parallel ...
Read more >
Parallel computing with Dask — xarray 0.11.1 documentation
Operations queue up a series of tasks mapped over blocks, ... By default, open_mfdataset will chunk each netCDF file into a single Dask...
Read more >
Reading larger than memory HDF data and writing ...
P.S. Running dask.compute(tasks[0]) indeed returns an xarray dataset but ... your HDF files to netCDF files, using Xarray, possibly writing Zarr, etc.
Read more >
Introduction to Xarray - Pythia Foundations
Its interface is based largely on the netCDF data model (variables, ... Here we'll initialize a DataArray object by wrapping a plain NumPy...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found