problem with open_mfdataset on data with duplicate coordinate variables
See original GitHub issueI’m having a problem using open_mfdataset to open a set of files that have a “duplicate” dimension. In this case, I’m using a numerical weather model that I have configured on a 300x300 grid. The files are originally in HDF5 format with unnamed dimensions. Normally, the model stores the x
and y
dimensions as phony_dim_0
and phony_dim_1
However, in this case since the x
and y
dimensions are the same, the model has saved both x
and y
dimensions in a variable as phony_dim_0
. This creates a variable (Theta, for example), with a ncdump header like:
netcdf out1 {
dimensions:
phony_dim_1 = 60 ;
phony_dim_0 = 300 ;
variables:
float THETA(phony_dim_1, phony_dim_0, phony_dim_0) ;
The output files also do not include time information, so I make sure they’re named sequentially and open them with a concat_dim='TIME'
to force concatenation. However, due to the duplicate dimensionality problem, I get the error below.
MCVE Code Sample
I have uploaded sample data and a small program to attempt to open the files here: https://drive.google.com/file/d/1aayITXcwrAP_w9uNqppd9mpaQf3O51s8/view?usp=sharing
import xarray as xr
ds = xr.open_mfdataset("./out*.nc", concat_dim='TIME')
print(ds)
Expected Output
The expected output is to print the contents of the dataset ds
to the screen
Problem Description
The program errors out with the following traceback:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-5-0079210385b2> in <module>
----> 1 control = xr.open_mfdataset(datadir + "feb2014_control/icefix*g1.h5", concat_dim='TIME')
~/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, lock, data_vars, coords, autoclose, parallel, **kwargs)
717 data_vars=data_vars, coords=coords,
718 infer_order_from_coords=infer_order_from_coords,
--> 719 ids=ids)
720 except ValueError:
721 for ds in datasets:
~/anaconda3/lib/python3.7/site-packages/xarray/core/combine.py in _auto_combine(datasets, concat_dims, compat, data_vars, coords, infer_order_from_coords, ids)
551 # Repeatedly concatenate then merge along each dimension
552 combined = _combine_nd(combined_ids, concat_dims, compat=compat,
--> 553 data_vars=data_vars, coords=coords)
554 return combined
555
~/anaconda3/lib/python3.7/site-packages/xarray/core/combine.py in _combine_nd(combined_ids, concat_dims, data_vars, coords, compat)
473 data_vars=data_vars,
474 coords=coords,
--> 475 compat=compat)
476 combined_ds = list(combined_ids.values())[0]
477 return combined_ds
~/anaconda3/lib/python3.7/site-packages/xarray/core/combine.py in _auto_combine_all_along_first_dim(combined_ids, dim, data_vars, coords, compat)
491 datasets = combined_ids.values()
492 new_combined_ids[new_id] = _auto_combine_1d(datasets, dim, compat,
--> 493 data_vars, coords)
494 return new_combined_ids
495
~/anaconda3/lib/python3.7/site-packages/xarray/core/combine.py in _auto_combine_1d(datasets, concat_dim, compat, data_vars, coords)
509 concatenated = [_auto_concat(list(ds_group), dim=dim,
510 data_vars=data_vars, coords=coords)
--> 511 for id, ds_group in grouped_by_vars]
512 else:
513 concatenated = datasets
~/anaconda3/lib/python3.7/site-packages/xarray/core/combine.py in <listcomp>(.0)
509 concatenated = [_auto_concat(list(ds_group), dim=dim,
510 data_vars=data_vars, coords=coords)
--> 511 for id, ds_group in grouped_by_vars]
512 else:
513 concatenated = datasets
~/anaconda3/lib/python3.7/site-packages/xarray/core/combine.py in _auto_concat(datasets, dim, data_vars, coords)
367 'explicitly')
368 dim, = concat_dims
--> 369 return concat(datasets, dim=dim, data_vars=data_vars, coords=coords)
370
371
~/anaconda3/lib/python3.7/site-packages/xarray/core/combine.py in concat(objs, dim, data_vars, coords, compat, positions, indexers, mode, concat_over)
118 raise TypeError('can only concatenate xarray Dataset and DataArray '
119 'objects, got %s' % type(first_obj))
--> 120 return f(objs, dim, data_vars, coords, compat, positions)
121
122
~/anaconda3/lib/python3.7/site-packages/xarray/core/combine.py in _dataset_concat(datasets, dim, data_vars, coords, compat, positions)
303 if k in concat_over:
304 vars = ensure_common_dims([ds.variables[k] for ds in datasets])
--> 305 combined = concat_vars(vars, dim, positions)
306 insert_result_variable(k, combined)
307
~/anaconda3/lib/python3.7/site-packages/xarray/core/variable.py in concat(variables, dim, positions, shortcut)
2083 along the given dimension.
2084 """
-> 2085 variables = list(variables)
2086 if all(isinstance(v, IndexVariable) for v in variables):
2087 return IndexVariable.concat(variables, dim, positions, shortcut)
~/anaconda3/lib/python3.7/site-packages/xarray/core/combine.py in ensure_common_dims(vars)
296 common_shape = tuple(non_concat_dims.get(d, dim_len)
297 for d in common_dims)
--> 298 var = var.set_dims(common_dims, common_shape)
299 yield var
300
~/anaconda3/lib/python3.7/site-packages/xarray/core/variable.py in set_dims(self, dims, shape)
1209 expanded_var = Variable(expanded_dims, expanded_data, self._attrs,
1210 self._encoding, fastpath=True)
-> 1211 return expanded_var.transpose(*dims)
1212
1213 def _stack_once(self, dims, new_dim):
~/anaconda3/lib/python3.7/site-packages/xarray/core/variable.py in transpose(self, *dims)
1152 return self.copy(deep=False)
1153
-> 1154 data = as_indexable(self._data).transpose(axes)
1155 return type(self)(dims, data, self._attrs, self._encoding,
1156 fastpath=True)
~/anaconda3/lib/python3.7/site-packages/xarray/core/indexing.py in transpose(self, order)
1210
1211 def transpose(self, order):
-> 1212 return self.array.transpose(order)
1213
1214
~/anaconda3/lib/python3.7/site-packages/dask/array/core.py in transpose(self, *axes)
1633 elif len(axes) == 1 and isinstance(axes[0], Iterable):
1634 axes = axes[0]
-> 1635 return transpose(self, axes=axes)
1636
1637 @derived_from(np.ndarray)
~/anaconda3/lib/python3.7/site-packages/dask/array/routines.py in transpose(a, axes)
155 if axes:
156 if len(axes) != a.ndim:
--> 157 raise ValueError("axes don't match array")
158 else:
159 axes = tuple(range(a.ndim))[::-1]
ValueError: axes don't match array
Output of xr.show_versions()
xarray: 0.12.1 pandas: 0.24.2 numpy: 1.16.4 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.1.0 distributed: 2.1.0 matplotlib: 3.1.0 cartopy: 0.16.0 seaborn: 0.9.0 setuptools: 41.0.1 pip: 19.1.1 conda: 4.7.5 pytest: None IPython: 7.6.1 sphinx: 2.1.2 None
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
It would be nice if we handled duplicate dimensions better, but it’s relatively rare so I’m not sure it’s worth making our code significantly more complex.
On Wed, Jul 31, 2019 at 9:26 AM Lucas notifications@github.com wrote:
Okay I created the preprocess function below:
And when I run
ds = open_mfdataset('<data_files>', concat_dim='time', preprocess=fix_dims')
, it creates a dataset filled with the variables listed in thevars
list.Is there any interest in having xarray detect this problem in the future and automatically work around it? I’d be interested in possibly trying to figure that out.
Thanks for all the help!