Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Null hypothesis of Kolmogorov Smirnov test is not correctly described

See original GitHub issue

The statement

Under the null hypothesis, the two distributions are identical, F(x)=G(x). The alternative hypothesis can be either ‘two-sided’ (default), ‘less’ or ‘greater’.

in the documentation of stats.kstest (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html) is not correct unless alternative == 'two-sided'. The other alternatives test F >= G vs F < G and vice versa.

Same problem for stats.ks_2samp (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ks_2samp.html#scipy.stats.ks_2samp):

This is a two-sided test for the null hypothesis that 2 independent samples are drawn from the same continuous distribution.

and stats.ks_1samp (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ks_1samp.html):

Under the null hypothesis, the two distributions are identical, F(x)=G(x).

The alternatives are correctly described in the R documentation: https://stat.ethz.ch/R-manual/R-patched/library/stats/html/ks.test.html

Issue Analytics

State:
Created 3 years ago
Comments:12 (12 by maintainers)

Top GitHub Comments

1reaction

mdhabercommented, Jul 20, 2020

This acknowledges the fact that sometimes the null hypothesis is written differently depending on the alternative, but that writing the null hypothesis the same way in all cases is also acceptable.

However, be aware that many researchers (including one of the co-authors in research work) use = in the null hypothesis, even with > or < as the symbol in the alternative hypothesis. This practice is acceptable because we only make the decision to reject or not reject the null hypothesis.

I favor the way it is currently written, as I think the distribution we used to determine the p-value should be derived from the null hypothesis (precisely as it is stated). For the KS test, does the distribution used to calculate p-values depend on the alternative hypothesis being tested? If not, I think that the way the null hypothesis is written is not incorrect regardless of the alternative.

0reactions

josef-pktcommented, Nov 27, 2020

sounds fine to me then to use the last version with weak inequality in null.

I’m sticking to equality null in statsmodels, because I don’t want to get into composite nulls when we don’t need or use it.

Top Results From Across the Web

1.3.5.16. Kolmogorov-Smirnov Goodness-of-Fit Test

The Kolmogorov-Smirnov (K-S) test is based on the empirical distribution function ... As expected, the null hypothesis is not rejected for the normally ......

Distribution fitted well but kolmogorov -smirnov test not ...

I would always advise against using any statistical test that tries to quantify whether two distributions are similar, ...

Kolmogorov Smirnov Test - an overview | ScienceDirect Topics

3.3. The null hypothesis is rejected at the α-level if where n1 and n2 denote the number of samples from each observation vector...

Kolmogorov-Smirnov and Kuiper's Tests of Time Variability

The null hypothesis is rejected if the value of the K-S statistic, D (defined below), is larger than a certain value. Corrections are...

Kolmogorov–Smirnov test - Wikipedia

In the two-sample case (see Section 3), the distribution considered under the null hypothesis is a continuous distribution but is otherwise unrestricted.