Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Shapiro test returning negative p-value

See original GitHub issue

My issue is about logical error while running shapiro test

Reproducing code example:

>>> from scipy import stats
>>> trans_val, maxlog = stats.boxcox([122500,474400,110400])
>>> stats.shapiro(trans_val)
ShapiroResult(statistic=0.08333337306976318, pvalue=-1.4407120943069458)

Error message:

The correct range of p-value should be [0,1]. The sample shown above is giving negative p-value which is incorrect.

Scipy/Numpy/Python version information:

>>> import sys, scipy, numpy; print(scipy.__version__, numpy.__version__, sys.version_info)
1.7.0 1.19.1 sys.version_info(major=3, minor=7, micro=6, releaselevel='final', serial=0)

Issue Analytics

State:
Created 2 years ago
Comments:7 (6 by maintainers)

Top GitHub Comments

1reaction

josef-pktcommented, Jul 23, 2021

given that y is already sorted before going to the fortran code, it might be better to subtract the median (or next value close to it)

median would be robust to outliers compared to mean

0reactions

josef-pktcommented, Jul 23, 2021

Nonzero mean is no problem if the variation is large enough compare to it. But if mean is huge compared to variation, then some computation get imprecise.

I only have examples in other contexts and don’t know or remember the details for shapiro-wilk, so I don’t know which computation cause the numerical problem.

Would you recommend subtracting the mean before passing the data to swilk?

I think that would be good. Fixes or improves precision for these kind of issues. It’s possible to make it conditional on a small relative range as in your example, but just unconditionally removing the mean is simpler.

Top Results From Across the Web

using shapiro wilk test to explain p-values - Cross Validated

The null hypothesis for a Shapiro Wilk test is that there is no difference between your distribution and a normal distribution. The alternative ......

[R] negative P-values with shapiro.test

test produces negative P-values, which should MC> never happen. This occurs for all of the situations MC> that I have tried for 3...

How do I interpret the Shapiro-Wilk test for normality in JMP

The null hypothesis for this test is that the data are normally distributed. The Prob < W value listed in the output is...

Apply Statistical Tests to Residuals - GitHub Pages

The Shapiro-Wilk tests the Normality of the residuals. The Null Hypothesis is that the residuals are normally distributed. A low P-Value below a...

What happens when your data fail your test's assumptions?

Shapiro-Wilk normality test ## ## data: demo2 ## W = 0.97168, p-value = 0.406. So, here we see that demo is not drawn...