MultiIndex listed multiple times in Dataset.indexes property
See original GitHub issueWhat happened?
When upgrading to 2022.6.0.rc0 from 2022.3.0 I noticed a possible unexpected breaking change in the Dataset.indexes property. MultiIndices are now listed for each dimension they apply for as well as once for the multi index itself when accessing dataset.indexes
.
What did you expect to happen?
Same behaviour as before, see example below.
Minimal Complete Verifiable Example
# execute with 2022.3.0 and 2022.6.0.rc0 to see the differences
import pandas
import xarray as xr
def _create_multiindex(**kwargs):
return pandas.MultiIndex.from_arrays(list(kwargs.values()), names=kwargs.keys())
ds = xr.Dataset()
ds.coords["measurement"] = _create_multiindex(
observation=["A", "A", "B", "B"],
wavelength=[0.4, 0.5, 0.6, 0.7],
stokes=["I", "Q", "I", "I"],
)
for name, idx in ds.indexes.items():
print(name, idx)
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
Relevant log output
Output with version 2022.3.0:
measurement MultiIndex([('A', 0.4, 'I'),
('A', 0.5, 'Q'),
('B', 0.6, 'I'),
('B', 0.7, 'I')],
names=['observation', 'wavelength', 'stokes'])
Output with version 2022.6.0.rc0:
measurement MultiIndex([('A', 0.4, 'I'),
('A', 0.5, 'Q'),
('B', 0.6, 'I'),
('B', 0.7, 'I')],
name='measurement')
observation MultiIndex([('A', 0.4, 'I'),
('A', 0.5, 'Q'),
('B', 0.6, 'I'),
('B', 0.7, 'I')],
name='measurement')
wavelength MultiIndex([('A', 0.4, 'I'),
('A', 0.5, 'Q'),
('B', 0.6, 'I'),
('B', 0.7, 'I')],
name='measurement')
stokes MultiIndex([('A', 0.4, 'I'),
('A', 0.5, 'Q'),
('B', 0.6, 'I'),
('B', 0.7, 'I')],
name='measurement')
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None python: 3.8.10 (default, Jan 28 2022, 09:41:12) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.10.102.1-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: (‘en_US’, ‘UTF-8’) libhdf5: 1.10.5 libnetcdf: 4.6.3
xarray: 2022.3.0 pandas: 1.4.3 numpy: 1.23.0 scipy: 1.9.0rc1 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: 3.7.0 Nio: None zarr: None cftime: 1.6.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3b3 cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.5.2 cartopy: 0.19.0.post1 seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None setuptools: 56.0.0 pip: 21.3.1 conda: None pytest: 7.1.2 IPython: 8.4.0 sphinx: 4.5.0
INSTALLED VERSIONS
commit: None python: 3.8.10 (default, Jan 28 2022, 09:41:12) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.10.102.1-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: (‘en_US’, ‘UTF-8’) libhdf5: 1.10.5 libnetcdf: 4.6.3
xarray: 2022.6.0rc0 pandas: 1.4.3 numpy: 1.23.0 scipy: 1.9.0rc1 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: 3.7.0 Nio: None zarr: None cftime: 1.6.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3b3 cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.5.2 cartopy: 0.19.0.post1 seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 56.0.0 pip: 21.3.1 conda: None pytest: 7.1.2 IPython: 8.4.0 sphinx: 4.5.0
Issue Analytics
- State:
- Created a year ago
- Comments:8 (8 by maintainers)
Not yet, this still has to be detailed in the documentation (tracked in #6293 along with other todo items related to indexes). The
Indexes
API already has some basic docstrings, though: https://github.com/pydata/xarray/blob/main/xarray/core/indexes.py#L1008-L1225Yes I think we can close it. Thanks for your feedback and for the issue report!