question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Weighted quantile

See original GitHub issue

For our work we frequently need to compute weighted quantiles. This is especially important when we need to weigh data from recent years more heavily in making predictions.

I’ve put together a function (called weighted_quantile) largely based on the source code of np.percentile. It allows one to input weights along a single dimension, as a dict w_dict. Below are some manual tests:

When all weights = 1, it’s identical to using np.nanpercentile:

>>> ar0
<xarray.DataArray (x: 3, y: 4)>
array([[3, 4, 8, 1],
       [5, 3, 7, 9],
       [4, 9, 6, 2]])
Coordinates:
  * x        (x) |S1 'a' 'b' 'c'
  * y        (y) int64 0 1 2 3
>>> ar0.quantile(q=[0.25, 0.5, 0.75], dim='y')
<xarray.DataArray (quantile: 3, x: 3)>
array([[ 2.5 ,  4.5 ,  3.5 ],
       [ 3.5 ,  6.  ,  5.  ],
       [ 5.  ,  7.5 ,  6.75]])
Coordinates:
  * x         (x) |S1 'a' 'b' 'c'
  * quantile  (quantile) float64 0.25 0.5 0.75
>>> weighted_quantile(da=ar0, q=[0.25, 0.5, 0.75], dim='y', w_dict={'y': [1,1,1,1]})
<xarray.DataArray (quantile: 3, x: 3)>
array([[ 2.5 ,  4.5 ,  3.5 ],
       [ 3.5 ,  6.  ,  5.  ],
       [ 5.  ,  7.5 ,  6.75]])
Coordinates:
  * x         (x) |S1 'a' 'b' 'c'
  * quantile  (quantile) float64 0.25 0.5 0.75

Now different weights:

>>> weighted_quantile(da=ar0, q=[0.25, 0.5, 0.75], dim='y', w_dict={'y': [1,2,3,4.0]})
<xarray.DataArray (quantile: 3, x: 3)>
array([[ 3.25    ,  5.666667,  4.333333],
       [ 4.      ,  7.      ,  5.333333],
       [ 6.      ,  8.      ,  6.75    ]])
Coordinates:
  * x         (x) |S1 'a' 'b' 'c'
  * quantile  (quantile) float64 0.25 0.5 0.75

Also handles nan values like np.nanpercentile:

>>> ar
<xarray.DataArray (x: 2, y: 2, z: 2)>
array([[[ nan,   3.],
        [ nan,   5.]],

       [[  8.,   1.],
        [ nan,   0.]]])
Coordinates:
  * x        (x) |S1 'a' 'b'
  * y        (y) int64 0 1
  * z        (z) int64 8 9
>>> da_stacked = ar.stack(mi=['x', 'y'])
>>> out = weighted_quantile(da=ar, q=[0.25, 0.5, 0.75], dim=['x', 'y'], w_dict={'x': [1, 1]})
>>> out
<xarray.DataArray (quantile: 3, z: 2)>
array([[ 8.  ,  0.75],
       [ 8.  ,  2.  ],
       [ 8.  ,  3.5 ]])
Coordinates:
  * z         (z) int64 8 9
  * quantile  (quantile) float64 0.25 0.5 0.75
>>> da_stacked.quantile(q=[0.25, 0.5, 0.75], dim='mi')
<xarray.DataArray (quantile: 3, z: 2)>
array([[ 8.  ,  0.75],
       [ 8.  ,  2.  ],
       [ 8.  ,  3.5 ]])
Coordinates:
  * z         (z) int64 8 9
  * quantile  (quantile) float64 0.25 0.5 0.75

Lastly, different interpolation schemes are consistent:

>>> out = weighted_quantile(da=ar, q=[0.25, 0.5, 0.75], dim=['x', 'y'], w_dict={'x': [1, 1]}, interpolation='nearest')
>>> out
<xarray.DataArray (quantile: 3, z: 2)>
array([[ 8.,  1.],
       [ 8.,  3.],
       [ 8.,  3.]])
Coordinates:
  * z         (z) int64 8 9
  * quantile  (quantile) float64 0.25 0.5 0.75
>>> da_stacked.quantile(q=[0.25, 0.5, 0.75], dim='mi', interpolation='nearest')
<xarray.DataArray (quantile: 3, z: 2)>
array([[ 8.,  1.],
       [ 8.,  3.],
       [ 8.,  3.]])
Coordinates:
  * z         (z) int64 8 9
  * quantile  (quantile) float64 0.25 0.5 0.75

We wonder if it’s ok to make this part of xarray. If so, the most logical place to implement it would seem to be in Variable.quantile(). Another option is to make it a utility function, to be called as xr.weighted_quantile().

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:8 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
chunweiyuancommented, Mar 14, 2019

So, as per suggestion by Stephan, I submitted the PR to numpy instead: https://github.com/numpy/numpy/pull/9211

Despite having possibly the highest number of comments of all active numpy PRs and passing all the tests, it’s been sitting in limbo for the last few months. There seems to be quite some uncertainty about what to do with it, and the discussion has gone off tangent a bit to more upstream issues.

My hope is still to have it eventually merged into numpy, so that it could be easily ported to xarray. It’s proven to be quite useful to my coworkers, as well as some of the numpy users. I believe it’ll also serve xarray well. Thank you all for your patience.

0reactions
shoyercommented, Mar 20, 2019

NumPy does have a pretty bad review back-log 😦

On Fri, Mar 15, 2019 at 11:01 AM chunweiyuan notifications@github.com wrote:

My personal hope is to keep this thread open just for the record. But given the non-activity on the numpy end, I honestly can’t promise any resolution to this issue in the near future. Thanks!

PS I persist because some people do seem to appreciate that PR and have forked it for their own use 😃

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1371#issuecomment-473387146, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1gzAFmpKOTSi0ljefC46A79vp8m1ks5vW9_8gaJpZM4M73c4 .

Read more comments on GitHub >

github_iconTop Results From Across the Web

Weighted quantile estimators - Andrey Akinshin's blog
In this post, we learn how to calculate and apply weighted quantiles. Literature overview. When I started looking for a weighted quantile ......
Read more >
Defining quantiles over a weighted sample - Cross Validated
Essentially the weighted sample is a sub-sampling of the full unweighted sample, with each element x(i) in the sub-sample representing weight(i) ...
Read more >
weighted.quantile function - RDocumentation
Description. A weighted cdf is calculated and quantiles are evaluated. Missing values are discarded. · Value. The quantile according to prob (by default...
Read more >
Weighted percentiles - The DO Loop - SAS Blogs
This article shows how to compute and visualize weighted percentiles, also known as a weighted quantiles, as computed by PROC MEANS and PROC ......
Read more >
4.2 The Weighted Quantile Sum (WQS) and its extensions
The weighted quantile sum (WQS), developed specifically for the context of environmental mixtures analysis, is an increasingly common approach that allows ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found