question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Set one-dimensional data variable as dimension coordinate?

See original GitHub issue

Code Sample

I have this dataset, and I’d like to make it indexable by time:

<xarray.Dataset>
Dimensions:                (station_observations: 46862)
Dimensions without coordinates: station_observations
Data variables:
    time                   (station_observations) datetime64[ns] ...
    SNOW_ON_THE_GROUND     (station_observations) float64 ...
    ONE_DAY_SNOW           (station_observations) float64 ...
    ONE_DAY_RAIN           (station_observations) float64 ...
    ONE_DAY_PRECIPITATION  (station_observations) float64 ...
    MIN_TEMP               (station_observations) float64 ...
    MAX_TEMP               (station_observations) float64 ...
Attributes:
    elevation:     15.0

Problem description

I expected to be able to use ds.set_coords to make the time variable an indexable coordinate. The variable IS converted to a coordinate, but it is not a dimension coordinate, so I can’t index with it. I can use assign_coords(station_observations=ds.time) to make station_observations indexable by time, but then the name in semantically wrong, and the time variable still exists, which makes the code harder to maintain.

Expected Output

ds.set_coords('time', inplace=True)
<xarray.Dataset>
Dimensions:                (station_observations: 46862)
Coordinates:
    time                   (station_observations) datetime64[ns] ...
Dimensions without coordinates: station_observations
Data variables:
    SNOW_ON_THE_GROUND     (station_observations) float64 ...
    ONE_DAY_SNOW           (station_observations) float64 ...
    ONE_DAY_RAIN           (station_observations) float64 ...
    ONE_DAY_PRECIPITATION  (station_observations) float64 ...
    MIN_TEMP               (station_observations) float64 ...
    MAX_TEMP               (station_observations) float64 ...
Attributes:
    elevation:     15.0

In [95]: ds.sel(time='1896')
ValueError: dimensions or multi-index levels ['time'] do not exist

with assign_coords:

In [97]: ds=ds.assign_coords(station_observations=ds.time)

In [98]: ds.sel(station_observations='1896')
Out[98]: 
<xarray.Dataset>
Dimensions:                (station_observations: 366)
Coordinates:
  * station_observations   (station_observations) datetime64[ns] 1896-01-01 ...
Data variables:
    time                   (station_observations) datetime64[ns] ...
    SNOW_ON_THE_GROUND     (station_observations) float64 ...
    ONE_DAY_SNOW           (station_observations) float64 ...
    ONE_DAY_RAIN           (station_observations) float64 ...
    ONE_DAY_PRECIPITATION  (station_observations) float64 ...
    MIN_TEMP               (station_observations) float64 ...
    MAX_TEMP               (station_observations) float64 ...
Attributes:
    elevation:     15.0

works correctly, but looks ugly. It would be nice if the time variable could be assigned as a dimension directly. I can drop the time variable and rename the station_observations, but it’s a little annoying to do so.

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None python: 3.6.6.final.0 python-bits: 64 OS: Linux OS-release: 4.16.0-041600-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: en_AU.UTF-8

xarray: 0.10.2 pandas: 0.22.0 numpy: 1.13.3 scipy: 0.19.1 netCDF4: 1.3.1 h5netcdf: None h5py: None Nio: None zarr: None bottleneck: 1.2.0 cyordereddict: None dask: 0.16.0 distributed: None matplotlib: 2.1.1 cartopy: None seaborn: None setuptools: 39.0.1 pip: 9.0.1 conda: None pytest: None IPython: 5.5.0 sphinx: None

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:13 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
keewiscommented, Nov 21, 2019

to get your example to work, use this:

data=pd.DataFrame({'x': [0, 1, 2], 'y': ['a', 'b', 'c'],'z': ['a', 'b', 'c']})
xr.DataArray(data.x, coords={'y': data.y, 'x':data.z}, dims="y")

to get both as dimensions, use

df = pd.DataFrame({'x': [0, 1, 2], 'y': ['a', 'b', 'c'],'z': ['a', 'b', 'c']})
ds = df.set_index(["y", "z"]).to_xarray()
1reaction
fujiisoupcommented, Oct 4, 2018

Hi @nedclimaterisk. Thanks for the raising an issue.

In that case, you can use swap_dims,

In [1]: import xarray as xr
   ...: ds = xr.Dataset({'x': ('i', [0, 1, 2]), 'y': ('i', ['a', 'b', 'c'])})
   ...: ds
   ...: 
   ...: 
Out[1]: 
<xarray.Dataset>
Dimensions:  (i: 3)
Dimensions without coordinates: i
Data variables:
    x        (i) int64 0 1 2
    y        (i) <U1 'a' 'b' 'c'

In [2]: ds.swap_dims({'i': 'x'})
Out[2]: 
<xarray.Dataset>
Dimensions:  (x: 3)
Coordinates:
  * x        (x) int64 0 1 2
Data variables:
    y        (x) <U1 'a' 'b' 'c'
Read more comments on GitHub >

github_iconTop Results From Across the Web

Reshaping and reorganizing data - Xarray
This method broadcasts all data variables in the dataset against each other, then concatenates them along a new dimension into a new array...
Read more >
Reshaping and reorganizing data — xarray 0.9.2 documentation
This method broadcasts all data variables in the dataset against each other, then concatenates them along a new dimension into a new array...
Read more >
Data Structures - xarray - Read the Docs
A 1D array or list, which is interpreted as values for a one dimensional coordinate variable along the same dimension as it's name....
Read more >
NCL variables
Coordinate variables are used to represent data coordinates. Each dimension of a variable can have a one-dimensional array of coordinate points that define...
Read more >
Basic data structures of xarray - Towards Data Science
The important thing to notice here is that coordinate arrays must be 1 dimensional and have the length of the dimension they represent....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found