Null hypothesis of Kolmogorov Smirnov test is not correctly described
See original GitHub issueThe statement
Under the null hypothesis, the two distributions are identical, F(x)=G(x). The alternative hypothesis can be either ‘two-sided’ (default), ‘less’ or ‘greater’.
in the documentation of stats.kstest
(https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html) is not correct unless alternative == 'two-sided'
. The other alternatives test F >= G
vs F < G
and vice versa.
Same problem for stats.ks_2samp
(https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ks_2samp.html#scipy.stats.ks_2samp):
This is a two-sided test for the null hypothesis that 2 independent samples are drawn from the same continuous distribution.
and stats.ks_1samp
(https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ks_1samp.html):
Under the null hypothesis, the two distributions are identical, F(x)=G(x).
The alternatives are correctly described in the R documentation: https://stat.ethz.ch/R-manual/R-patched/library/stats/html/ks.test.html
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (12 by maintainers)
This acknowledges the fact that sometimes the null hypothesis is written differently depending on the alternative, but that writing the null hypothesis the same way in all cases is also acceptable.
I favor the way it is currently written, as I think the distribution we used to determine the p-value should be derived from the null hypothesis (precisely as it is stated). For the KS test, does the distribution used to calculate p-values depend on the alternative hypothesis being tested? If not, I think that the way the null hypothesis is written is not incorrect regardless of the alternative.
sounds fine to me then to use the last version with weak inequality in null.
I’m sticking to equality null in statsmodels, because I don’t want to get into composite nulls when we don’t need or use it.