Should Dataset constructor align DataArrays based on coords?
See original GitHub issueWe have a DataArray unaligned to coords
here:
In [12]:
coords={'a': range(4), 'b': range(3)}
da = xray.DataArray(np.random.rand(4,3), coords=coords)
da
Out[12]:
<xray.DataArray (a: 4, b: 3)>
array([[ 0.05126985, 0.95460352, 0.12853847],
[ 0.20577943, 0.80265117, 0.46370886],
[ 0.0226791 , 0.33068145, 0.55748573],
[ 0.15943175, 0.20183347, 0.46907727]])
Coordinates:
* a (a) int64 0 1 2 3
* b (b) int64 0 1 2
In [13]:
da_reindex = da.reindex(b=[1,2,0])
da_reindex
Out[13]:
<xray.DataArray (a: 4, b: 3)>
array([[ 0.95460352, 0.12853847, 0.05126985],
[ 0.80265117, 0.46370886, 0.20577943],
[ 0.33068145, 0.55748573, 0.0226791 ],
[ 0.20183347, 0.46907727, 0.15943175]])
Coordinates:
* a (a) int64 0 1 2 3
* b (b) int64 1 2 0
If we add this to Dataset and supply coords=coords
, it raises, since there are conflicting coords:
In [16]:
:
ds = xray.Dataset(variables={'da':da_reindex}, coords=coords)
ds
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-16-8e4afa10f781> in <module>()
----> 1 ds = xray.Dataset(variables={'da':da_reindex}, coords=coords)
2 ds
/usr/local/lib/python2.7/dist-packages/xray/core/dataset.pyc in __init__(self, variables, coords, attrs, compat)
317 coords = set()
318 if variables or coords:
--> 319 self._set_init_vars_and_dims(variables, coords, compat)
320 if attrs is not None:
321 self.attrs = attrs
/usr/local/lib/python2.7/dist-packages/xray/core/dataset.pyc in _set_init_vars_and_dims(self, vars, coords, compat)
371 aligned = _align_variables(variables)
372 new_variables, new_coord_names = _expand_variables(aligned,
--> 373 compat=compat)
374
375 new_coord_names.update(coords)
/usr/local/lib/python2.7/dist-packages/xray/core/dataset.pyc in _expand_variables(raw_variables, old_variables, compat)
142 add_variable(dim, coord.variable)
143 var = var.variable
--> 144 add_variable(name, var)
145
146 return new_variables, new_coord_names
/usr/local/lib/python2.7/dist-packages/xray/core/dataset.pyc in add_variable(name, var)
130 raise ValueError('conflicting value for variable %s:\n'
131 'first value: %r\nsecond value: %r'
--> 132 % (name, variables[name], var))
133 if compat == 'broadcast_equals':
134 maybe_promote_or_replace(name, var)
ValueError: conflicting value for variable b:
first value: <xray.Coordinate 'b' (b: 3)>
array([1, 2, 0])
second value: <xray.Coordinate 'b' (b: 3)>
array([0, 1, 2])
But adding with __setitem__
aligns as expected:
In [17]:
ds
ds = xray.Dataset(coords=coords)
ds['da']=da_reindex
ds
Out[17]:
<xray.Dataset>
Dimensions: (a: 4, b: 3)
Coordinates:
* a (a) int64 0 1 2 3
* b (b) int64 0 1 2
Data variables:
da (a, b) float64 0.05127 0.9546 0.1285 0.2058 0.8027 0.4637 ...
Is this desired behavior? I could imagine aligning to coords if they are explicitly supplied, but I don’t know the nuances here.
Issue Analytics
- State:
- Created 8 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Data Structures - Xarray
Coordinates enable fast label based indexing and alignment, building on the functionality of the index found on a pandas DataFrame or Series ....
Read more >Concatenating DataArray with tolerance in xarray
I would like to concatenate several DataArray whose dimension coordinates are almost aligned, like this: import xarray as xr da1 = xr.
Read more >Xarray Fundamentals - Research Computing in Earth Sciences
Describe the core xarray data structures, the DataArray and the Dataset ... coords should be a dictionary of the same form as data_vars....
Read more >Data Structures - xarray - Read the Docs
Coordinates enable fast label based indexing and alignment, ... arguments in the DataArray constructor will be filled in from the pandas object:.
Read more >Combining data — xarray 0.10.6 documentation
For combining datasets or data arrays along a dimension, see concatenate. ... the resulting dataset will be aligned on the union of all...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’d be up for implementing this, if people agree. Let me know…
Yes. I think we have some existing internal machinery for this in
align
(seeindexes
). Indexes that are explicitly provided in coords (e.g.,coords={'time': time_values}
) should take precedence. Indexes that are incidentally included on another coordinate should not (e.g.,coords={'day_of_week': DataArray(values, coords=[('time', time_values)])}
).