Interest in percentile function that matches behavior of Matlab / IDL?
See original GitHub issueMatlab uses linear interpolation based on the rank of the data see Algorithm part of the Matlab documentation. This algorithm is also used in some IDL libraries and can not be replicated using any of the interpolation schemes in the numpy.percentile
function, see example at the bottom of this post.
It can easily be implemented as:
matlab_percentile(in_data, percentiles):
"""
Calculate percentiles in the way IDL and Matlab do it.
By using interpolation between the lowest an highest rank and the
minimum and maximum outside.
Parameters
----------
in_data: numpy.ndarray
input data
percentiles: numpy.ndarray
percentiles at which to calculate the values
Returns
-------
perc: numpy.ndarray
values of the percentiles
"""
data = np.sort(in_data)
p_rank = 100.0 * (np.arange(data.size) + 0.5) / data.size
perc = np.interp(percentiles, p_rank, data, left=data[0], right=data[-1])
return perc
Example of differences between numpy
and matlab
:
In [2]: import numpy as np
In [3]: a = np.array([2,4,6,8,10])
In [4]: matlab_percentile(a, [25, 50, 75])
Out[4]: array([ 3.5, 6. , 8.5])
In [5]: np.percentile(a, [25, 50, 75])
Out[5]: array([ 4., 6., 8.])
In [6]: np.percentile(a, [25, 50, 75], interpolation='linear')
Out[6]: array([ 4., 6., 8.])
In [7]: np.percentile(a, [25, 50, 75], interpolation='lower')
Out[7]: array([4, 6, 8])
In [8]: np.percentile(a, [25, 50, 75], interpolation='higher')
Out[8]: array([4, 6, 8])
In [9]: np.percentile(a, [25, 50, 75], interpolation='midpoint')
Out[9]: array([ 5., 7., 9.])
In [10]: np.percentile(a, [25, 50, 75], interpolation='nearest')
Out[10]: array([4, 6, 8])
Issue Analytics
- State:
- Created 8 years ago
- Reactions:3
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Percentiles of data set - MATLAB prctile - MathWorks
This MATLAB function returns percentiles of elements in input data A for the percentages ... Calculate the percentile of a data set for...
Read more >IDL Help for IDLUTILS - SDSS-III
Any unused array elements are set to -1. mdist: For each A, distance to matches in B, sorted by their distance. If mmax...
Read more >SciPy Reference Guide
SciPy is a collection of mathematical algorithms and convenience functions built on the Numpy extension for Python.
Read more >Three Distinct Sets of Connector Hubs Integrate Human Brain ...
Complex behavior in humans is enabled by the networked interaction of multiple brain regions specialized for specific behavioral functions.
Read more >NASA Open Source Software
558 NASA Open Source Software Projects match query ... The MATLAB-Zemax toolkit is a set of MATLAB scripts and functions that enable rapid...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I suspect, the matlab version is just
method="hazen"
(method 5 from H&F, which is the same as R). So I am going to close the issue.EDIT: To be clear, us adding those methods are not endorsing them. I think which method to use depends on the use-case, and e.g. the H&F paper suggests
median_unbiased
ornormal_unbiased
in general IIRC.FWIW the
interpolation
methods listed in the above example are out-of-date as of v1.22. The kwarg is calledmethod
now, and there are 9 methods implemented. None of them are exactly the “matlab” version, but I think bothmedian_unbiased
andnormal_unbiased
are pretty close.