question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Incorrect quantile computation

See original GitHub issue

While looking at the quantile computation, I have noticed that the quantile calculation does not match up with the calculations returned by numpy in python (can be considered as the reference).

There are 4 different ways to interpolate quantiles when one quantile does not land on an exact value: linear, lower, higher, midpoint, nearest. However it seems like neither of these matches with the way simple-statistics computes percentiles.

Using data = [0, 0, 0.3, 1.2, 1.23, 3.5, 10, 12, 23.3, 32.1] and computing the 25th, median and 75th percentile we get:

  • simple-statistics: [ 0.3, 2.365, 12.0 ]
  • simple-statistics python implementation: [0.3, 1.23, 12]
  • numpy / linear: [ 0.525, 2.365, 11.5 ]
  • numpy / lower: [ 0.3, 1.23, 10.0 ]
  • numpy / higher: [ 1.2, 3.5, 12.0 ]
  • numpy / midpoint: [ 0.75, 2.365, 11.0 ]
  • numpy / nearest: [ 0.3, 1.23, 12.0 ]

For reproducibility, this is the code I used:

const ss = require('simple-statistics')

var data = [0, 0, 0.3, 1.2, 1.23,  3.5, 10, 12, 23.3, 32.1]
console.log(ss.quantile(data, [0.25, 0.5, 0.75]))
import numpy as np
import simplestatistics as ss

data = [0, 0, 0.3, 1.2, 1.23,  3.5, 10, 12, 23.3, 32.1]
for i in ['linear', 'lower', 'higher', 'midpoint', 'nearest']:
    print(i, np.percentile(data, [25, 50, 75], interpolation=i))

print(ss.quantile(data, [0.25, 0.50, 0.75]))

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
yiyangecommented, Feb 26, 2021

Somewhat related, this line of comment is a bit misleading (i was mislead by this thus made some incorrect comment)

https://github.com/simple-statistics/simple-statistics/blob/1db09fc5d781fb661a2d0c7b413e3acead531696/src/quantile_sorted.js#L27

p is definitely not an integer by this point. but idx can be integer or a float

0reactions
rbox-riskcommented, Sep 3, 2021

@tmcw From the long and well established R stats ecosystem, the quantile function runs 9 different algorithms, of which type=7 is the default. This default agrees with the numpy / linear output with these data.

rdata <- c(0, 0, 0.3, 1.2, 1.23, 3.5, 10, 12, 23.3, 32.1)
quantile(rdata, c(0.25,0.5,0.75))

gives

   25%    50%    75% 
 0.525  2.365 11.500 

Running all types

t(sapply(1:9, function(i) quantile(rdata, c(0.25,0.5,0.75), type=i)))

gives output (with annotations):

          25%   50%      75%
 [1,] 0.30000 1.230 12.00000 (ss python)
 [2,] 0.30000 2.365 12.00000
 [3,] 0.00000 1.230 12.00000
 [4,] 0.15000 1.230 11.00000
 [5,] 0.30000 2.365 12.00000 (ss)
 [6,] 0.22500 2.365 14.82500
 [7,] 0.52500 2.365 11.50000 (R default, numpy default/linear)
 [8,] 0.27500 2.365 12.94167
 [9,] 0.28125 2.365 12.70625

recovering the ss python implementation as type=1 and the ss js implementation as type=5. The documentation describes each algo, and provides some useful references

Read more comments on GitHub >

github_iconTop Results From Across the Web

np.quantile with wrong calculation? - python - Stack Overflow
that is actually my main point. I know that I get the correct answer with np. quantile(x, 0.25, interpolation="lower") = 366.
Read more >
what i am doing wrong in calculating quartiles
I use formula (n+1)/4 for the first quartile and 3/4(n+1) for the 3rd quartile. For the 1st quartile :(8+1)/4= 9/4 = 2 1/4,...
Read more >
Quantiles, Percentiles: Why so many ways to calculate them?
The problem is calculating quantiles. The formulas are simple enough, but a take a quick look on Wikipedia and you'll see there are...
Read more >
4.5.1 Calculating the range and interquartile range
Because it falls between ranks 6 and 7, there are six data points on each side of the median. The lower quartile is...
Read more >
R is giving me incorrect quartiles. : r/rstats - Reddit
Looking at your online calculator, it seems it finds the 'observed' quantile (the notation under 'How quartiles are calculated' is weird to ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found