question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DataArray.set_index throws error on documented input

See original GitHub issue

Problem Description

Docs for DataArray.set_index describe the main indexes argument as:

Mapping from names matching dimensions and values given by (lists of) the names of existing coordinates or variables to set as new (multi-)index.

This suggests that one can set a DataArray instance’s coordinates by passing in a dimension and a list-like object of coordinates.

MCVE

In [1]: import numpy as np

In [2]: import xarray as xr

In [3]: arr = xr.DataArray(data=np.ones((2, 3)), dims=['x', 'y'])

In [4]: arr.dims
Out[4]: ('x', 'y')

In [5]: arr.set_index({'x': range(2)})
KeyError   
...
    144         for n in var_names:
--> 145             var = variables[n]
    146             if (current_index_variable is not None and
    147                     var.dims != current_index_variable.dims):

KeyError: 0

At first, I thought it might be because coords and _coords were not being set in this case:

In [18]: arr.coords
Out[18]: 
Coordinates:
    *empty*

In [19]: arr._coords
Out[19]: OrderedDict()

but even if I set the coordinates first and then try to re-index, it fails:

In [20]: arr = xr.DataArray(data=np.ones((2, 3)), dims=['x', 'y'], coords={'x': range(2), 'y': range(3)})
In [21]: arr.set_index({'x': ['a', 'b', 'c']})
...
    144         for n in var_names:
--> 145             var = variables[n]
    146             if (current_index_variable is not None and
    147                     var.dims != current_index_variable.dims):

Expected Output

I expect my MCVE to work based on the documentation.

Problem Solution

My guess is that the issue is Xarray is using the merge_indexes function (see here) from the Dataset module, and there is no concept of a variable in a DataArray.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
gwgundersencommented, Aug 18, 2019

Looks like the idea of a glossary is already being discussed in https://github.com/pydata/xarray/issues/2410.

0reactions
max-sixtycommented, Aug 18, 2019

Good work on finding that issue. I think even if we can get something brief in, that would be helpful.

On the specific definitions:

What do you think of the terminology “alternative” or “auxiliary” dimension? a is clearly a dimension in the sense that it has coordinates or labels for all the “tick marks” along the x dimension.

For me ‘dimension’ has a precise definition from traditional sciences, so having our ‘coordinate’ be an additional / auxiliary / alternative dimension wouldn’t be consistent with that (e.g. a 4-dimensional array would still be 4 dimensional regardless of how many coordinates it had).

At the very least, I’d love to add a lot more examples of how to actually use these things.

👍

Read more comments on GitHub >

github_iconTop Results From Across the Web

Indexing and selecting data - Xarray
The most basic way to access elements of a DataArray object is to use Python's [] syntax, such as array[i, j] , where...
Read more >
xarray.DataArray
Returns a new DataArray with duplicate dimension values removed. drop_indexes (coord_names, *[, errors]). Drop the indexes assigned to the given coordinates.
Read more >
API reference — xarray 0.10.3 documentation
Given any number of Dataset and/or DataArray objects, returns new objects with aligned indexes and dimension sizes. broadcast (*args, **kwargs), Explicitly ...
Read more >
API reference - Xarray
Returns an array with dropped variables. DataArray.drop_indexes (coord_names, *[, errors]). Drop the indexes assigned to the given coordinates.
Read more >
Parallel computing with Dask - Xarray
For more details on Dask, read its documentation. ... this is not possible, they will raise an exception rather than unexpectedly loading data...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found