question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

xr.combine_nested() fails when passed nested DataSets

See original GitHub issue

xr.__version__ '0.13.0'

xr.combine_nested() works when passed a nested list of DataArray objects.

da1 = xr.DataArray(name="a", data=[[0]], dims=["x", "y"])
da2 = xr.DataArray(name="b", data=[[1]], dims=["x", "y"])
da3 = xr.DataArray(name="a", data=[[2]], dims=["x", "y"])
da4 = xr.DataArray(name="b", data=[[3]], dims=["x", "y"])
xr.combine_nested([[da1, da2], [da3, da4]], concat_dim=["x", "y"])

returns

<xarray.DataArray 'a' (x: 2, y: 2)>
array([[0, 1],
       [2, 3]])
Dimensions without coordinates: x, y

but fails if passed a nested list of DataSet objects.

ds1 = da1.to_dataset()
ds2 = da2.to_dataset()
ds3 = da3.to_dataset()
ds4 = da4.to_dataset()
xr.combine_nested([[ds1, ds2], [ds3, ds4]], concat_dim=["x", "y"])

returns

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-8-c0035883fc68> in <module>
      3 ds3 = da3.to_dataset()
      4 ds4 = da4.to_dataset()
----> 5 xr.combine_nested([[ds1, ds2], [ds3, ds4]], concat_dim=["x", "y"])

~/repos/contribute/xarray/xarray/core/combine.py in combine_nested(datasets, concat_dim, compat, data_vars, coords, fill_value, join)
    462         ids=False,
    463         fill_value=fill_value,
--> 464         join=join,
    465     )
    466 

~/repos/contribute/xarray/xarray/core/combine.py in _nested_combine(datasets, concat_dims, compat, data_vars, coords, ids, fill_value, join)
    305         coords=coords,
    306         fill_value=fill_value,
--> 307         join=join,
    308     )
    309     return combined

~/repos/contribute/xarray/xarray/core/combine.py in _combine_nd(combined_ids, concat_dims, data_vars, coords, compat, fill_value, join)
    196             compat=compat,
    197             fill_value=fill_value,
--> 198             join=join,
    199         )
    200     (combined_ds,) = combined_ids.values()

~/repos/contribute/xarray/xarray/core/combine.py in _combine_all_along_first_dim(combined_ids, dim, data_vars, coords, compat, fill_value, join)
    218         datasets = combined_ids.values()
    219         new_combined_ids[new_id] = _combine_1d(
--> 220             datasets, dim, compat, data_vars, coords, fill_value, join
    221         )
    222     return new_combined_ids

~/repos/contribute/xarray/xarray/core/combine.py in _combine_1d(datasets, concat_dim, compat, data_vars, coords, fill_value, join)
    246                 compat=compat,
    247                 fill_value=fill_value,
--> 248                 join=join,
    249             )
    250         except ValueError as err:

~/repos/contribute/xarray/xarray/core/concat.py in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join)
    131             "objects, got %s" % type(first_obj)
    132         )
--> 133     return f(objs, dim, data_vars, coords, compat, positions, fill_value, join)
    134 
    135 

~/repos/contribute/xarray/xarray/core/concat.py in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join)
    363     for k in datasets[0].variables:
    364         if k in concat_over:
--> 365             vars = ensure_common_dims([ds.variables[k] for ds in datasets])
    366             combined = concat_vars(vars, dim, positions)
    367             assert isinstance(combined, Variable)

~/repos/contribute/xarray/xarray/core/concat.py in <listcomp>(.0)
    363     for k in datasets[0].variables:
    364         if k in concat_over:
--> 365             vars = ensure_common_dims([ds.variables[k] for ds in datasets])
    366             combined = concat_vars(vars, dim, positions)
    367             assert isinstance(combined, Variable)

~/repos/contribute/xarray/xarray/core/utils.py in __getitem__(self, key)
    383 
    384     def __getitem__(self, key: K) -> V:
--> 385         return self.mapping[key]
    386 
    387     def __iter__(self) -> Iterator[K]:

KeyError: 'a'

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
dcheriancommented, Sep 25, 2019

concat ignores DataArray.name. I don’t know if we should consider it a bug or a feature 😃

0reactions
friedrichknuthcommented, Feb 12, 2020

Few observations after looking at the default flags for concat:

xr.concat(
    objs,
    dim,
    data_vars='all',
    coords='different',
    compat='equals',
    positions=None,
    fill_value=<NA>,
    join='outer',
)

The description of compat='equals' indicates combining DataArrays with different names should fail: 'equals': all values and dimensions must be the same. (though I am not entirely sure what is meant by values… I assume this perhaps generically means keys?)

Another option is compat='identical' which is described as: 'identical': all values, dimensions and attributes must be the same. Using this flag will cause the operation to fail, as one would expect from the description…

objs = [xr.DataArray([0], 
                     dims='x', 
                     name='a'),
        xr.DataArray([1], 
                     dims='x', 
                     name='b')]

xr.concat(objs, dim='x', compat='identical')
ValueError: array names not identical

… and is the case for concat on Datasets, as previously shown by @TomNicholas

objs = [xr.Dataset({'a': ('x', [0])}),
        xr.Dataset({'b': ('x', [0])})]

xr.concat(objs, dim='x')
ValueError: 'a' is not present in all datasets.

However, 'identical': all values, dimensions and **attributes** must be the same. doesn’t quite seem to be the case for DataArrays, as

objs = [xr.DataArray([0], 
                     dims='x', 
                     name='a', 
                     attrs={'foo':1}),
        xr.DataArray([1], 
                     dims='x', 
                     name='a', 
                     attrs={'bar':2})]

xr.concat(objs, dim='x', compat='identical')

succeeds with

<xarray.DataArray 'a' (x: 2)>
array([0, 1])
Dimensions without coordinates: x
Attributes:
    foo:      1

but again fails on Datasets, as one would expect from the description.

ds1 = xr.Dataset({'a': ('x', [0])})
ds1.attrs['foo'] = 'example attribute'

ds2 = xr.Dataset({'a': ('x', [1])})
ds2.attrs['bar'] = 'example attribute'

objs = [ds1,ds2]
xr.concat(objs, dim='x',compat='identical')
ValueError: Dataset global attributes not equal.

Also had a look at compat='override', which will override an attrs inconsistency but not a naming one when applied to Datasets. Works as expected on DataArrays. It is described as 'override': skip comparing and pick variable from first dataset.

Potential resolutions:

  1. 'identical' should raise an error when attributes are not the same for DataArrays

  2. 'equals' should raise an error when DataArray names are not identical (unless one is None, which works with Datasets and seems fine to be replaced)

  3. 'override' should override naming inconsistencies when combining DataSets.

Final thought: perhaps promoting to Dataset when all requirements are met for a DataArray to be considered as such, might simplify keeping operations and checks consistent?

Read more comments on GitHub >

github_iconTop Results From Across the Web

xarray.combine_nested — xarray 0.12.2 documentation
Useful for combining datasets from a set of nested directories, or for collecting the output of a simulation parallelized along multiple dimensions.
Read more >
open_mfdataset with xarray failing to find coordinates
Use combine='nested' instead. From the Xarray documentation on combining by coords: Attempt to auto-magically combine the given datasets ...
Read more >
Value error when merging .nc files: xarray could not find ...
ds = xr.open_mfdataset( os.getcwd() + '/test/*.nc', concat_dim="Time", combine="nested", ). I would be interested to know if there is a more ...
Read more >
Master-Detail Reports with Nested DataSets
ActiveReportsJS, designer, master, detail, report, data, data region, sets, dataset, nested, nesting, nested data regions, child, parent, hierarchical.
Read more >
pydata/xarray
do all datasets passed to concat have the same attributes? ... is what I really want combine="nested" ? this seems to work when...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found