question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Interest in percentile function that matches behavior of Matlab / IDL?

See original GitHub issue

Matlab uses linear interpolation based on the rank of the data see Algorithm part of the Matlab documentation. This algorithm is also used in some IDL libraries and can not be replicated using any of the interpolation schemes in the numpy.percentile function, see example at the bottom of this post.

It can easily be implemented as:

matlab_percentile(in_data, percentiles):
    """
    Calculate percentiles in the way IDL and Matlab do it.

    By using interpolation between the lowest an highest rank and the
    minimum and maximum outside.

    Parameters
    ----------
    in_data: numpy.ndarray
        input data
    percentiles: numpy.ndarray
        percentiles at which to calculate the values

    Returns
    -------
    perc: numpy.ndarray
        values of the percentiles
    """

    data = np.sort(in_data)
    p_rank = 100.0 * (np.arange(data.size) + 0.5) / data.size
    perc = np.interp(percentiles, p_rank, data, left=data[0], right=data[-1])
    return perc

Example of differences between numpy and matlab:

In [2]: import numpy as np

In [3]: a = np.array([2,4,6,8,10])

In [4]: matlab_percentile(a, [25, 50, 75])
Out[4]: array([ 3.5,  6. ,  8.5])

In [5]: np.percentile(a, [25, 50, 75])
Out[5]: array([ 4.,  6.,  8.])

In [6]: np.percentile(a, [25, 50, 75], interpolation='linear')
Out[6]: array([ 4.,  6.,  8.])

In [7]: np.percentile(a, [25, 50, 75], interpolation='lower')
Out[7]: array([4, 6, 8])

In [8]: np.percentile(a, [25, 50, 75], interpolation='higher')
Out[8]: array([4, 6, 8])

In [9]: np.percentile(a, [25, 50, 75], interpolation='midpoint')
Out[9]: array([ 5.,  7.,  9.])

In [10]: np.percentile(a, [25, 50, 75], interpolation='nearest')
Out[10]: array([4, 6, 8])

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Reactions:3
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
sebergcommented, May 12, 2022

I suspect, the matlab version is just method="hazen" (method 5 from H&F, which is the same as R). So I am going to close the issue.

EDIT: To be clear, us adding those methods are not endorsing them. I think which method to use depends on the use-case, and e.g. the H&F paper suggests median_unbiased or normal_unbiased in general IIRC.

0reactions
rossbarcommented, May 12, 2022

FWIW the interpolation methods listed in the above example are out-of-date as of v1.22. The kwarg is called method now, and there are 9 methods implemented. None of them are exactly the “matlab” version, but I think both median_unbiased and normal_unbiased are pretty close.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Percentiles of data set - MATLAB prctile - MathWorks
This MATLAB function returns percentiles of elements in input data A for the percentages ... Calculate the percentile of a data set for...
Read more >
IDL Help for IDLUTILS - SDSS-III
Any unused array elements are set to -1. mdist: For each A, distance to matches in B, sorted by their distance. If mmax...
Read more >
SciPy Reference Guide
SciPy is a collection of mathematical algorithms and convenience functions built on the Numpy extension for Python.
Read more >
Three Distinct Sets of Connector Hubs Integrate Human Brain ...
Complex behavior in humans is enabled by the networked interaction of multiple brain regions specialized for specific behavioral functions.
Read more >
NASA Open Source Software
558 NASA Open Source Software Projects match query ... The MATLAB-Zemax toolkit is a set of MATLAB scripts and functions that enable rapid...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found