question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MultiIndex listed multiple times in Dataset.indexes property

See original GitHub issue

What happened?

When upgrading to 2022.6.0.rc0 from 2022.3.0 I noticed a possible unexpected breaking change in the Dataset.indexes property. MultiIndices are now listed for each dimension they apply for as well as once for the multi index itself when accessing dataset.indexes.

What did you expect to happen?

Same behaviour as before, see example below.

Minimal Complete Verifiable Example

# execute with 2022.3.0 and 2022.6.0.rc0 to see the differences
import pandas
import xarray as xr

def _create_multiindex(**kwargs):
    return pandas.MultiIndex.from_arrays(list(kwargs.values()), names=kwargs.keys())


ds = xr.Dataset()
ds.coords["measurement"] = _create_multiindex(
    observation=["A", "A", "B", "B"],
    wavelength=[0.4, 0.5, 0.6, 0.7],
    stokes=["I", "Q", "I", "I"],
)

for name, idx in ds.indexes.items():
    print(name, idx)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

Output with version 2022.3.0:

measurement MultiIndex([('A', 0.4, 'I'),
            ('A', 0.5, 'Q'),
            ('B', 0.6, 'I'),
            ('B', 0.7, 'I')],
           names=['observation', 'wavelength', 'stokes'])


Output with version 2022.6.0.rc0:

measurement MultiIndex([('A', 0.4, 'I'),
            ('A', 0.5, 'Q'),
            ('B', 0.6, 'I'),
            ('B', 0.7, 'I')],
           name='measurement')
observation MultiIndex([('A', 0.4, 'I'),
            ('A', 0.5, 'Q'),
            ('B', 0.6, 'I'),
            ('B', 0.7, 'I')],
           name='measurement')
wavelength MultiIndex([('A', 0.4, 'I'),
            ('A', 0.5, 'Q'),
            ('B', 0.6, 'I'),
            ('B', 0.7, 'I')],
           name='measurement')
stokes MultiIndex([('A', 0.4, 'I'),
            ('A', 0.5, 'Q'),
            ('B', 0.6, 'I'),
            ('B', 0.7, 'I')],
           name='measurement')

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None python: 3.8.10 (default, Jan 28 2022, 09:41:12) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.10.102.1-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: (‘en_US’, ‘UTF-8’) libhdf5: 1.10.5 libnetcdf: 4.6.3

xarray: 2022.3.0 pandas: 1.4.3 numpy: 1.23.0 scipy: 1.9.0rc1 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: 3.7.0 Nio: None zarr: None cftime: 1.6.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3b3 cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.5.2 cartopy: 0.19.0.post1 seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None setuptools: 56.0.0 pip: 21.3.1 conda: None pytest: 7.1.2 IPython: 8.4.0 sphinx: 4.5.0

INSTALLED VERSIONS

commit: None python: 3.8.10 (default, Jan 28 2022, 09:41:12) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.10.102.1-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: (‘en_US’, ‘UTF-8’) libhdf5: 1.10.5 libnetcdf: 4.6.3

xarray: 2022.6.0rc0 pandas: 1.4.3 numpy: 1.23.0 scipy: 1.9.0rc1 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: 3.7.0 Nio: None zarr: None cftime: 1.6.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3b3 cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.5.2 cartopy: 0.19.0.post1 seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 56.0.0 pip: 21.3.1 conda: None pytest: 7.1.2 IPython: 8.4.0 sphinx: 4.5.0

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
benbovycommented, Sep 5, 2022

But finding information about those changes right now was not so easy, is there some resource available where I can read up about the changes to indexes and functions related to them.

Not yet, this still has to be detailed in the documentation (tracked in #6293 along with other todo items related to indexes). The Indexes API already has some basic docstrings, though: https://github.com/pydata/xarray/blob/main/xarray/core/indexes.py#L1008-L1225

0reactions
benbovycommented, Sep 5, 2022

That can probably be closed then, since it was an intentional change.

Yes I think we can close it. Thanks for your feedback and for the issue report!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Working with Multi-Index Pandas DataFrames
A multi-index dataframe allows you to store your data in multi-dimension format, and opens up a lot of exciting to represent your data....
Read more >
Hierarchical Indexing | Python Data Science Handbook
Methods of MultiIndex Creation¶. The most straightforward way to construct a multiply indexed Series or DataFrame is to simply pass a list of...
Read more >
selecting from multi-index pandas - python - Stack Overflow
I have a multi-index data frame with columns 'A' and 'B'. Is there is a way to select rows by filtering on one...
Read more >
MultiIndex / advanced indexing — pandas 1.5.2 documentation
Creating a MultiIndex (hierarchical index) object​​ You can think of MultiIndex as an array of tuples where each tuple is unique. A MultiIndex...
Read more >
MultiIndex serialization to NetCDF · Issue #1077 · pydata/xarray
tippetts opened this issue on Nov 3, 2016 · 30 comments ... MultiIndex listed multiple times in Dataset.indexes property #6752.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found