question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

assign values from `xr.groupby_bins` to new `variable`

See original GitHub issue

Code Sample, a copy-pastable example if possible

A “Minimal, Complete and Verifiable Example” will make it much easier for maintainers to help you: http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

# Your code here
import pandas as pd
import numpy as np
import xarray as xr

time = pd.date_range('2010-01-01','2011-12-31',freq='M')
lat = np.linspace(-5.175003, -4.7250023, 10)
lon = np.linspace(33.524994, 33.97499, 10)
precip = np.random.normal(0, 1, size=(len(time), len(lat), len(lon)))

ds = xr.Dataset(
    {'precip': (['time', 'lat', 'lon'], precip)},
    coords={
        'lon': lon,
        'lat': lat,
        'time': time,
    }
)

variable = 'precip'

# calculate a cumsum over some window size
rolling_window = 3
ds_window = (
    ds.rolling(time=rolling_window, center=True)
    .sum()
    .dropna(dim='time', how='all')
)

# construct a cumulative frequency distribution ranking the precip values
# per month
rank_norm_list = []
for mth in range(1, 13):
    ds_mth = (
        ds_window
        .where(ds_window['time.month'] == mth)
        .dropna(dim='time', how='all')
    )
    rank_norm_mth = (
        (ds_mth.rank(dim='time') - 1) / (ds_mth.time.size - 1.0) * 100.0
    )
    rank_norm_mth = rank_norm_mth.rename({variable: 'rank_norm'})
    rank_norm_list.append(rank_norm_mth)

rank_norm = xr.merge(rank_norm_list).sortby('time')

# assign bins to variable xarray
bins = [20., 40., 60., 80., np.Inf]
decile_index_gpby = rank_norm.groupby_bins('rank_norm', bins=bins)
out = decile_index_gpby.assign()  # assign_coords()

Problem description

[this should explain why the current behavior is a problem and why the expected output is a better solution.]

I want to calculate the Decile Index - see the ex1-Calculate Decile Index (DI) with Python.ipynb.

The pandas implementation is simple enough but I need help with applying the bin labels to a new variable / coordinate.

Expected Output

<xarray.Dataset>
Dimensions:   (lat: 10, lon: 10, time: 24)
Coordinates:
  * time      (time) datetime64[ns] 2010-01-31 2010-02-28 ... 2011-12-31
  * lat       (lat) float32 -5.175003 -5.125 -5.075001 ... -4.7750015 -4.7250023
  * lon       (lon) float32 33.524994 33.574997 33.625 ... 33.925003 33.97499
Data variables:
    precip    (time, lat, lon) float32 4.6461554 4.790813 ... 7.3063064 7.535994
    rank_bin  (lat, lon, time) int64 1 3 3 0 1 4 2 3 0 1 ... 0 4 0 1 3 1 2 2 3 1

Output of xr.show_versions()

# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.7.0 | packaged by conda-forge | (default, Nov 12 2018, 12:34:36) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2

xarray: 0.12.1 pandas: 0.24.2 numpy: 1.16.4 scipy: 1.3.0 netCDF4: 1.5.1.2 pydap: None h5netcdf: None h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: 1.0.17 cfgrib: 0.9.7 iris: None bottleneck: 1.2.1 dask: 1.2.2 distributed: 1.28.1 matplotlib: 3.1.0 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 41.0.1 pip: 19.1 conda: None pytest: 4.5.0 IPython: 7.1.1 sphinx: 2.0.1

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
rabernatcommented, Jun 7, 2019

If you just want different coordinates for the result of groupby_bins, you can pass the labels keyword. See example here: http://xarray.pydata.org/en/stable/groupby.html#binning

0reactions
tommylees112commented, Jul 7, 2019

Perfect thankyou!

Read more comments on GitHub >

github_iconTop Results From Across the Web

GroupBy: Group and Bin Data - Xarray
Group by operations work on both Dataset and DataArray objects. Most of the examples focus on grouping by a single one-dimensional variable, although...
Read more >
groupby_bins on two variables? - python xarray - Stack Overflow
What I'd like to do is group values in 2D bins of longitude x latitude, so I can show the result as a...
Read more >
API reference — xarray 0.10.4 documentation
Dataset.assign (**kwargs), Assign new data variables to a Dataset, returning a new object with all the original variables in addition to the new...
Read more >
Working with Multidimensional Coordinates
By default, groupby will use every unique value in the variable, which is probably not what we want. Instead, we can use the...
Read more >
Xarray Interpolation, Groupby, Resample, Rolling, and Coarsen
import numpy as np import xarray as xr from matplotlib import pyplot as plt %xmode ... But what if we want to estimate...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found