question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Should Dataset constructor align DataArrays based on coords?

See original GitHub issue

We have a DataArray unaligned to coords here:

In [12]:

coords={'a': range(4), 'b': range(3)}
da = xray.DataArray(np.random.rand(4,3), coords=coords)
da
Out[12]:
<xray.DataArray (a: 4, b: 3)>
array([[ 0.05126985,  0.95460352,  0.12853847],
       [ 0.20577943,  0.80265117,  0.46370886],
       [ 0.0226791 ,  0.33068145,  0.55748573],
       [ 0.15943175,  0.20183347,  0.46907727]])
Coordinates:
  * a        (a) int64 0 1 2 3
  * b        (b) int64 0 1 2
In [13]:

da_reindex = da.reindex(b=[1,2,0])
da_reindex
Out[13]:
<xray.DataArray (a: 4, b: 3)>
array([[ 0.95460352,  0.12853847,  0.05126985],
       [ 0.80265117,  0.46370886,  0.20577943],
       [ 0.33068145,  0.55748573,  0.0226791 ],
       [ 0.20183347,  0.46907727,  0.15943175]])
Coordinates:
  * a        (a) int64 0 1 2 3
  * b        (b) int64 1 2 0

If we add this to Dataset and supply coords=coords, it raises, since there are conflicting coords:

In [16]:

:
ds = xray.Dataset(variables={'da':da_reindex}, coords=coords)
ds
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-8e4afa10f781> in <module>()
----> 1 ds = xray.Dataset(variables={'da':da_reindex}, coords=coords)
      2 ds

/usr/local/lib/python2.7/dist-packages/xray/core/dataset.pyc in __init__(self, variables, coords, attrs, compat)
    317             coords = set()
    318         if variables or coords:
--> 319             self._set_init_vars_and_dims(variables, coords, compat)
    320         if attrs is not None:
    321             self.attrs = attrs

/usr/local/lib/python2.7/dist-packages/xray/core/dataset.pyc in _set_init_vars_and_dims(self, vars, coords, compat)
    371         aligned = _align_variables(variables)
    372         new_variables, new_coord_names = _expand_variables(aligned,
--> 373                                                            compat=compat)
    374 
    375         new_coord_names.update(coords)

/usr/local/lib/python2.7/dist-packages/xray/core/dataset.pyc in _expand_variables(raw_variables, old_variables, compat)
    142                     add_variable(dim, coord.variable)
    143             var = var.variable
--> 144         add_variable(name, var)
    145 
    146     return new_variables, new_coord_names

/usr/local/lib/python2.7/dist-packages/xray/core/dataset.pyc in add_variable(name, var)
    130                 raise ValueError('conflicting value for variable %s:\n'
    131                                  'first value: %r\nsecond value: %r'
--> 132                                  % (name, variables[name], var))
    133             if compat == 'broadcast_equals':
    134                 maybe_promote_or_replace(name, var)

ValueError: conflicting value for variable b:
first value: <xray.Coordinate 'b' (b: 3)>
array([1, 2, 0])
second value: <xray.Coordinate 'b' (b: 3)>
array([0, 1, 2])

But adding with __setitem__ aligns as expected:

In [17]:

ds
ds = xray.Dataset(coords=coords)
ds['da']=da_reindex
ds
Out[17]:
<xray.Dataset>
Dimensions:  (a: 4, b: 3)
Coordinates:
  * a        (a) int64 0 1 2 3
  * b        (b) int64 0 1 2
Data variables:
    da       (a, b) float64 0.05127 0.9546 0.1285 0.2058 0.8027 0.4637 ...

Is this desired behavior? I could imagine aligning to coords if they are explicitly supplied, but I don’t know the nuances here.

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
max-sixtycommented, Dec 10, 2017

I’d be up for implementing this, if people agree. Let me know…

0reactions
shoyercommented, Dec 11, 2017

Yes. I think we have some existing internal machinery for this in align (see indexes). Indexes that are explicitly provided in coords (e.g., coords={'time': time_values}) should take precedence. Indexes that are incidentally included on another coordinate should not (e.g., coords={'day_of_week': DataArray(values, coords=[('time', time_values)])}).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Data Structures - Xarray
Coordinates enable fast label based indexing and alignment, building on the functionality of the index found on a pandas DataFrame or Series ....
Read more >
Concatenating DataArray with tolerance in xarray
I would like to concatenate several DataArray whose dimension coordinates are almost aligned, like this: import xarray as xr da1 = xr.
Read more >
Xarray Fundamentals - Research Computing in Earth Sciences
Describe the core xarray data structures, the DataArray and the Dataset ... coords should be a dictionary of the same form as data_vars....
Read more >
Data Structures - xarray - Read the Docs
Coordinates enable fast label based indexing and alignment, ... arguments in the DataArray constructor will be filled in from the pandas object:.
Read more >
Combining data — xarray 0.10.6 documentation
For combining datasets or data arrays along a dimension, see concatenate. ... the resulting dataset will be aligned on the union of all...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found