GroupBy of stacked dim with strings renames underlying dims
See original GitHub issueNames for dimensions are lost (renamed) when they are stacked and grouped, if one of the dimensions has string coordinates.
data = np.zeros((2,1,1))
dims = ['c', 'y', 'x']
d1 = xr.DataArray(data, dims=dims)
g1 = d1.stack(f=['c', 'x']).groupby('f').first()
print('Expected dim names:')
print(g1.coords)
print()
d2 = xr.DataArray(data, dims=dims, coords={'c': ['R', 'G']})
g2 = d2.stack(f=['c', 'x']).groupby('f').first()
print('Unexpected dim names:')
print(g2.coords)
Output
It is expected the ‘f_level_0’ and ‘f_level_1’ be ‘c’ and ‘x’, respectively in the second part below.
Expected dim names:
Coordinates:
* f (f) MultiIndex
- c (f) int64 0 1
- x (f) int64 0 0
Unexpected dim names:
Coordinates:
* f (f) MultiIndex
- f_level_0 (f) object 'G' 'R'
- f_level_1 (f) int64 0 0
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.4 (default, Jul 9 2019, 18:13:23)
[Clang 10.0.1 (clang-1001.0.46.4)]
python-bits: 64
OS: Darwin
OS-release: 18.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.2
libnetcdf: 4.6.3
xarray: 0.12.3 pandas: 0.25.1 numpy: 1.17.1 scipy: 1.3.1 netCDF4: 1.5.2 pydap: None h5netcdf: None h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.1.1 cartopy: None seaborn: None numbagg: None setuptools: 41.2.0 pip: 19.2.3 conda: None pytest: None IPython: 7.8.0 sphinx: None
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (5 by maintainers)
Top Results From Across the Web
Wrong time dimension after doing a groupby with library ...
I have opened a dataset, which contains daily data over the year 2013: datset=xr.open_dataset(filein) . The contents of the file are: <xarray.
Read more >Xarray Interpolation, Groupby, Resample, Rolling, and Coarsen
Resample in xarray is nearly identical to Pandas. It can be applied only to time-index dimensions. Here we compute the five-year mean. It...
Read more >Query dimensions - Apache Druid
Returns dimension values as is and optionally renames the dimension. ... Then the groupBy/topN processing pipeline "explodes" all multi-value dimensions ...
Read more >GroupBy: Group and Bin Data - Xarray
Group by operations work on both Dataset and DataArray objects. Most of the examples focus on grouping by a single one-dimensional variable, although...
Read more >Overview: Level of Detail Expressions - Tableau Help
FIXED level of detail expressions can result in measures or dimensions, depending on the underlying field in the aggregate expression.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks @max-sixty. Contrary to my warning about not doing a PR, I couldn’t help myself and dug in a bit. It turns out that string coordinates aren’t the problem, it’s when the coordinate isn’t in sorted order. For example, @chrisroat’s original example doesn’t error if the coordinate is
["G", "R"]
instead of["R", "G"]
. A more concrete WIP test:yields
I’ll return to this tomorrow, in the meantime if this triggers any thoughts about the best path forward, that would be much appreciated!
Re the reordering; that’s the case, though it does reorder the dimension, not just the coord (i.e. it’s still correctly aligned). Slight change to the original example to demonstrate.
Yes that second reference looks like the place @spencerahill!