Incorrect quantile computation
See original GitHub issueWhile looking at the quantile computation, I have noticed that the quantile calculation does not match up with the calculations returned by numpy in python (can be considered as the reference).
There are 4 different ways to interpolate quantiles when one quantile does not land on an exact value: linear
, lower
, higher
, midpoint
, nearest
. However it seems like neither of these matches with the way simple-statistics computes percentiles.
Using data = [0, 0, 0.3, 1.2, 1.23, 3.5, 10, 12, 23.3, 32.1] and computing the 25th, median and 75th percentile we get:
- simple-statistics: [ 0.3, 2.365, 12.0 ]
- simple-statistics python implementation: [0.3, 1.23, 12]
- numpy / linear: [ 0.525, 2.365, 11.5 ]
- numpy / lower: [ 0.3, 1.23, 10.0 ]
- numpy / higher: [ 1.2, 3.5, 12.0 ]
- numpy / midpoint: [ 0.75, 2.365, 11.0 ]
- numpy / nearest: [ 0.3, 1.23, 12.0 ]
For reproducibility, this is the code I used:
const ss = require('simple-statistics')
var data = [0, 0, 0.3, 1.2, 1.23, 3.5, 10, 12, 23.3, 32.1]
console.log(ss.quantile(data, [0.25, 0.5, 0.75]))
import numpy as np
import simplestatistics as ss
data = [0, 0, 0.3, 1.2, 1.23, 3.5, 10, 12, 23.3, 32.1]
for i in ['linear', 'lower', 'higher', 'midpoint', 'nearest']:
print(i, np.percentile(data, [25, 50, 75], interpolation=i))
print(ss.quantile(data, [0.25, 0.50, 0.75]))
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
np.quantile with wrong calculation? - python - Stack Overflow
that is actually my main point. I know that I get the correct answer with np. quantile(x, 0.25, interpolation="lower") = 366.
Read more >what i am doing wrong in calculating quartiles
I use formula (n+1)/4 for the first quartile and 3/4(n+1) for the 3rd quartile. For the 1st quartile :(8+1)/4= 9/4 = 2 1/4,...
Read more >Quantiles, Percentiles: Why so many ways to calculate them?
The problem is calculating quantiles. The formulas are simple enough, but a take a quick look on Wikipedia and you'll see there are...
Read more >4.5.1 Calculating the range and interquartile range
Because it falls between ranks 6 and 7, there are six data points on each side of the median. The lower quartile is...
Read more >R is giving me incorrect quartiles. : r/rstats - Reddit
Looking at your online calculator, it seems it finds the 'observed' quantile (the notation under 'How quartiles are calculated' is weird to ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Somewhat related, this line of comment is a bit misleading (i was mislead by this thus made some incorrect comment)
https://github.com/simple-statistics/simple-statistics/blob/1db09fc5d781fb661a2d0c7b413e3acead531696/src/quantile_sorted.js#L27
p is definitely not an integer by this point. but
idx
can be integer or a float@tmcw From the long and well established R stats ecosystem, the quantile function runs 9 different algorithms, of which
type=7
is the default. This default agrees with the numpy / linear output with these data.gives
Running all types
gives output (with annotations):
recovering the ss python implementation as
type=1
and the ss js implementation astype=5
. The documentation describes each algo, and provides some useful references