Shapiro test returning negative p-value
See original GitHub issueMy issue is about logical error while running shapiro test
Reproducing code example:
>>> from scipy import stats
>>> trans_val, maxlog = stats.boxcox([122500,474400,110400])
>>> stats.shapiro(trans_val)
ShapiroResult(statistic=0.08333337306976318, pvalue=-1.4407120943069458)
Error message:
The correct range of p-value should be [0,1]. The sample shown above is giving negative p-value which is incorrect.
Scipy/Numpy/Python version information:
>>> import sys, scipy, numpy; print(scipy.__version__, numpy.__version__, sys.version_info)
1.7.0 1.19.1 sys.version_info(major=3, minor=7, micro=6, releaselevel='final', serial=0)
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (6 by maintainers)
Top Results From Across the Web
using shapiro wilk test to explain p-values - Cross Validated
The null hypothesis for a Shapiro Wilk test is that there is no difference between your distribution and a normal distribution. The alternative ......
Read more >[R] negative P-values with shapiro.test
test produces negative P-values, which should MC> never happen. This occurs for all of the situations MC> that I have tried for 3...
Read more >How do I interpret the Shapiro-Wilk test for normality in JMP
The null hypothesis for this test is that the data are normally distributed. The Prob < W value listed in the output is...
Read more >Apply Statistical Tests to Residuals - GitHub Pages
The Shapiro-Wilk tests the Normality of the residuals. The Null Hypothesis is that the residuals are normally distributed. A low P-Value below a...
Read more >What happens when your data fail your test's assumptions?
Shapiro-Wilk normality test ## ## data: demo2 ## W = 0.97168, p-value = 0.406. So, here we see that demo is not drawn...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
given that y is already sorted before going to the fortran code, it might be better to subtract the median (or next value close to it)
median would be robust to outliers compared to mean
Nonzero mean is no problem if the variation is large enough compare to it. But if mean is huge compared to variation, then some computation get imprecise.
I only have examples in other contexts and don’t know or remember the details for shapiro-wilk, so I don’t know which computation cause the numerical problem.
I think that would be good. Fixes or improves precision for these kind of issues. It’s possible to make it conditional on a small relative range as in your example, but just unconditionally removing the mean is simpler.