question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Shapiro test returning negative p-value

See original GitHub issue

My issue is about logical error while running shapiro test

Reproducing code example:

>>> from scipy import stats
>>> trans_val, maxlog = stats.boxcox([122500,474400,110400])
>>> stats.shapiro(trans_val)
ShapiroResult(statistic=0.08333337306976318, pvalue=-1.4407120943069458)

Error message:

The correct range of p-value should be [0,1]. The sample shown above is giving negative p-value which is incorrect.

Scipy/Numpy/Python version information:

>>> import sys, scipy, numpy; print(scipy.__version__, numpy.__version__, sys.version_info)
1.7.0 1.19.1 sys.version_info(major=3, minor=7, micro=6, releaselevel='final', serial=0)

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
josef-pktcommented, Jul 23, 2021

given that y is already sorted before going to the fortran code, it might be better to subtract the median (or next value close to it)

median would be robust to outliers compared to mean

0reactions
josef-pktcommented, Jul 23, 2021

Nonzero mean is no problem if the variation is large enough compare to it. But if mean is huge compared to variation, then some computation get imprecise.

I only have examples in other contexts and don’t know or remember the details for shapiro-wilk, so I don’t know which computation cause the numerical problem.

Would you recommend subtracting the mean before passing the data to swilk?

I think that would be good. Fixes or improves precision for these kind of issues. It’s possible to make it conditional on a small relative range as in your example, but just unconditionally removing the mean is simpler.

Read more comments on GitHub >

github_iconTop Results From Across the Web

using shapiro wilk test to explain p-values - Cross Validated
The null hypothesis for a Shapiro Wilk test is that there is no difference between your distribution and a normal distribution. The alternative ......
Read more >
[R] negative P-values with shapiro.test
test produces negative P-values, which should MC> never happen. This occurs for all of the situations MC> that I have tried for 3...
Read more >
How do I interpret the Shapiro-Wilk test for normality in JMP
The null hypothesis for this test is that the data are normally distributed. The Prob < W value listed in the output is...
Read more >
Apply Statistical Tests to Residuals - GitHub Pages
The Shapiro-Wilk tests the Normality of the residuals. The Null Hypothesis is that the residuals are normally distributed. A low P-Value below a...
Read more >
What happens when your data fail your test's assumptions?
Shapiro-Wilk normality test ## ## data: demo2 ## W = 0.97168, p-value = 0.406. So, here we see that demo is not drawn...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found