Weighted quantile
See original GitHub issueFor our work we frequently need to compute weighted quantiles. This is especially important when we need to weigh data from recent years more heavily in making predictions.
I’ve put together a function (called weighted_quantile) largely based on the source code of np.percentile. It allows one to input weights along a single dimension, as a dict w_dict. Below are some manual tests:
When all weights = 1, it’s identical to using np.nanpercentile:
>>> ar0
<xarray.DataArray (x: 3, y: 4)>
array([[3, 4, 8, 1],
[5, 3, 7, 9],
[4, 9, 6, 2]])
Coordinates:
* x (x) |S1 'a' 'b' 'c'
* y (y) int64 0 1 2 3
>>> ar0.quantile(q=[0.25, 0.5, 0.75], dim='y')
<xarray.DataArray (quantile: 3, x: 3)>
array([[ 2.5 , 4.5 , 3.5 ],
[ 3.5 , 6. , 5. ],
[ 5. , 7.5 , 6.75]])
Coordinates:
* x (x) |S1 'a' 'b' 'c'
* quantile (quantile) float64 0.25 0.5 0.75
>>> weighted_quantile(da=ar0, q=[0.25, 0.5, 0.75], dim='y', w_dict={'y': [1,1,1,1]})
<xarray.DataArray (quantile: 3, x: 3)>
array([[ 2.5 , 4.5 , 3.5 ],
[ 3.5 , 6. , 5. ],
[ 5. , 7.5 , 6.75]])
Coordinates:
* x (x) |S1 'a' 'b' 'c'
* quantile (quantile) float64 0.25 0.5 0.75
Now different weights:
>>> weighted_quantile(da=ar0, q=[0.25, 0.5, 0.75], dim='y', w_dict={'y': [1,2,3,4.0]})
<xarray.DataArray (quantile: 3, x: 3)>
array([[ 3.25 , 5.666667, 4.333333],
[ 4. , 7. , 5.333333],
[ 6. , 8. , 6.75 ]])
Coordinates:
* x (x) |S1 'a' 'b' 'c'
* quantile (quantile) float64 0.25 0.5 0.75
Also handles nan values like np.nanpercentile:
>>> ar
<xarray.DataArray (x: 2, y: 2, z: 2)>
array([[[ nan, 3.],
[ nan, 5.]],
[[ 8., 1.],
[ nan, 0.]]])
Coordinates:
* x (x) |S1 'a' 'b'
* y (y) int64 0 1
* z (z) int64 8 9
>>> da_stacked = ar.stack(mi=['x', 'y'])
>>> out = weighted_quantile(da=ar, q=[0.25, 0.5, 0.75], dim=['x', 'y'], w_dict={'x': [1, 1]})
>>> out
<xarray.DataArray (quantile: 3, z: 2)>
array([[ 8. , 0.75],
[ 8. , 2. ],
[ 8. , 3.5 ]])
Coordinates:
* z (z) int64 8 9
* quantile (quantile) float64 0.25 0.5 0.75
>>> da_stacked.quantile(q=[0.25, 0.5, 0.75], dim='mi')
<xarray.DataArray (quantile: 3, z: 2)>
array([[ 8. , 0.75],
[ 8. , 2. ],
[ 8. , 3.5 ]])
Coordinates:
* z (z) int64 8 9
* quantile (quantile) float64 0.25 0.5 0.75
Lastly, different interpolation schemes are consistent:
>>> out = weighted_quantile(da=ar, q=[0.25, 0.5, 0.75], dim=['x', 'y'], w_dict={'x': [1, 1]}, interpolation='nearest')
>>> out
<xarray.DataArray (quantile: 3, z: 2)>
array([[ 8., 1.],
[ 8., 3.],
[ 8., 3.]])
Coordinates:
* z (z) int64 8 9
* quantile (quantile) float64 0.25 0.5 0.75
>>> da_stacked.quantile(q=[0.25, 0.5, 0.75], dim='mi', interpolation='nearest')
<xarray.DataArray (quantile: 3, z: 2)>
array([[ 8., 1.],
[ 8., 3.],
[ 8., 3.]])
Coordinates:
* z (z) int64 8 9
* quantile (quantile) float64 0.25 0.5 0.75
We wonder if it’s ok to make this part of xarray. If so, the most logical place to implement it would seem to be in Variable.quantile(). Another option is to make it a utility function, to be called as xr.weighted_quantile().
Issue Analytics
- State:
- Created 6 years ago
- Comments:8 (6 by maintainers)

Top Related StackOverflow Question
So, as per suggestion by Stephan, I submitted the PR to numpy instead: https://github.com/numpy/numpy/pull/9211
Despite having possibly the highest number of comments of all active numpy PRs and passing all the tests, it’s been sitting in limbo for the last few months. There seems to be quite some uncertainty about what to do with it, and the discussion has gone off tangent a bit to more upstream issues.
My hope is still to have it eventually merged into numpy, so that it could be easily ported to xarray. It’s proven to be quite useful to my coworkers, as well as some of the numpy users. I believe it’ll also serve xarray well. Thank you all for your patience.
NumPy does have a pretty bad review back-log 😦
On Fri, Mar 15, 2019 at 11:01 AM chunweiyuan notifications@github.com wrote: