question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GroupBy of stacked dim with strings renames underlying dims

See original GitHub issue

Names for dimensions are lost (renamed) when they are stacked and grouped, if one of the dimensions has string coordinates.

data = np.zeros((2,1,1))
dims = ['c', 'y', 'x']

d1 = xr.DataArray(data, dims=dims)
g1 = d1.stack(f=['c', 'x']).groupby('f').first()
print('Expected dim names:')
print(g1.coords)
print()

d2 = xr.DataArray(data, dims=dims, coords={'c': ['R', 'G']})
g2 = d2.stack(f=['c', 'x']).groupby('f').first()
print('Unexpected dim names:')
print(g2.coords)

Output

It is expected the ‘f_level_0’ and ‘f_level_1’ be ‘c’ and ‘x’, respectively in the second part below.

Expected dim names:
Coordinates:
  * f        (f) MultiIndex
  - c        (f) int64 0 1
  - x        (f) int64 0 0

Unexpected dim names:
Coordinates:
  * f          (f) MultiIndex
  - f_level_0  (f) object 'G' 'R'
  - f_level_1  (f) int64 0 0

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (default, Jul 9 2019, 18:13:23) [Clang 10.0.1 (clang-1001.0.46.4)] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.2 libnetcdf: 4.6.3

xarray: 0.12.3 pandas: 0.25.1 numpy: 1.17.1 scipy: 1.3.1 netCDF4: 1.5.2 pydap: None h5netcdf: None h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.1.1 cartopy: None seaborn: None numbagg: None setuptools: 41.2.0 pip: 19.2.3 conda: None pytest: None IPython: 7.8.0 sphinx: None

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
spencerahillcommented, Mar 26, 2020

Thanks @max-sixty. Contrary to my warning about not doing a PR, I couldn’t help myself and dug in a bit. It turns out that string coordinates aren’t the problem, it’s when the coordinate isn’t in sorted order. For example, @chrisroat’s original example doesn’t error if the coordinate is ["G", "R"] instead of ["R", "G"]. A more concrete WIP test:

def test_stack_groupby_unsorted_coord():
    data = [[0, 1], [2, 3]]
    data_flat = [0, 1, 2, 3]
    dims = ["y", "x"]
    y_vals = [2, 3]

    # "y" coord is in sorted order, and everything works
    arr = xr.DataArray(data, dims=dims, coords={"y": y_vals})
    actual1 = arr.stack(z=["y", "x"]).groupby("z").first()
    midx = pd.MultiIndex.from_product([[2, 3], [0, 1]], names=dims)
    expected1 = xr.DataArray(data_flat, dims=["z"], coords={"z": midx})
    xr.testing.assert_equal(actual1, expected1)
    
    # Now "y" coord is NOT in sorted order, and the bug appears
    arr = xr.DataArray(data, dims=dims, coords={"y": y_vals[::-1]})
    actual2 = arr.stack(z=["y", "x"]).groupby("z").first()
    midx = pd.MultiIndex.from_product([[3, 2], [0, 1]], names=dims)
    expected2 = xr.DataArray(data_flat, dims=["z"], coords={"z": midx})
    xr.testing.assert_equal(actual2, expected2)

test_stack_groupby_str_coords()

yields

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)

[...]

AssertionError: Left and right DataArray objects are not equal

Differing values:
L
    array([2, 3, 0, 1])
R
    array([0, 1, 2, 3])
Differing coordinates:
L * z        (z) MultiIndex
  - z_leve...(z) int64 2 2 3 3
  - z_leve...(z) int64 0 1 0 1
R * z        (z) MultiIndex
  - y        (z) int64 3 3 2 2
  - x        (z) int64 0 1 0 1

I’ll return to this tomorrow, in the meantime if this triggers any thoughts about the best path forward, that would be much appreciated!

0reactions
max-sixtycommented, Mar 25, 2020

Re the reordering; that’s the case, though it does reorder the dimension, not just the coord (i.e. it’s still correctly aligned). Slight change to the original example to demonstrate.

In [18]: data = np.arange(2).reshape((2,1,1))
    ...: dims = ['c', 'y', 'x']
    ...:
    ...: d1 = xr.DataArray(data, dims=dims)
    ...: g1 = d1.stack(f=['c', 'x']).groupby('f').first()
    ...: print('Expected dim names:')
    ...: print(g1.coords)
    ...: print()
    ...:
    ...: d2 = xr.DataArray(data, dims=dims, coords={'c': ['R', 'G']})
    ...: g2 = d2.stack(f=['c', 'x']).groupby('f').first()
    ...: print('Unexpected dim names:')
    ...: print(g2.coords)
Expected dim names:
Coordinates:
  * f        (f) MultiIndex
  - c        (f) int64 0 1
  - x        (f) int64 0 0

Unexpected dim names:
Coordinates:
  * f          (f) MultiIndex
  - f_level_0  (f) object 'G' 'R'
  - f_level_1  (f) int64 0 0

In [19]: d2
Out[19]:
<xarray.DataArray (c: 2, y: 1, x: 1)>
array([[[0]],

       [[1]]])
Coordinates:
  * c        (c) <U1 'R' 'G'
Dimensions without coordinates: y, x

In [20]: g2
Out[20]:
<xarray.DataArray (y: 1, f: 2)>
array([[1, 0]])
Coordinates:
  * f          (f) MultiIndex
  - f_level_0  (f) object 'G' 'R'
  - f_level_1  (f) int64 0 0
Dimensions without coordinates: y

Yes that second reference looks like the place @spencerahill!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Wrong time dimension after doing a groupby with library ...
I have opened a dataset, which contains daily data over the year 2013: datset=xr.open_dataset(filein) . The contents of the file are: <xarray.
Read more >
Xarray Interpolation, Groupby, Resample, Rolling, and Coarsen
Resample in xarray is nearly identical to Pandas. It can be applied only to time-index dimensions. Here we compute the five-year mean. It...
Read more >
Query dimensions - Apache Druid
Returns dimension values as is and optionally renames the dimension. ... Then the groupBy/topN processing pipeline "explodes" all multi-value dimensions ...
Read more >
GroupBy: Group and Bin Data - Xarray
Group by operations work on both Dataset and DataArray objects. Most of the examples focus on grouping by a single one-dimensional variable, although...
Read more >
Overview: Level of Detail Expressions - Tableau Help
FIXED level of detail expressions can result in measures or dimensions, depending on the underlying field in the aggregate expression.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found