scipy.stats.chisquare test does not check that observed and expected frequencies add to same total
See original GitHub issueThe function scipy.stats.chisquare performs the chi-square test on a vector of observed and expected frequencies. For the test to make sens, the observed and expected frequency vectors must sum to the same total, otherwise the result is nonsense since the inputs are incompatible. So there are two options: either an error should be thrown if they do not, or f_exp need to be rescaled so they sum to the total of f_obs. The first is probably better, or the second with a warning thrown.
Reproducing code example:
# In the following example, one vector is an exact multiple of the other.
# this means that the observed and expected frequencies are exactly
# proportional. This should give a p-value of 1 (not significant at all).
# Instead, you get the following:
from scipy.stats import chisquare
chisquare(f_obs=[10,20], f_exp=[30,60])
# Power_divergenceResult(statistic=40.0, pvalue=2.5396285894708634e-10)
# the statistic of 40 is calculated as following, which directly follows the
# formula. This is then plugged into a chi squared distribution to get a
#p-value of close to 0, which is the opposite of the significance you should
# get.
# ((10-30)**2 /30) + ((20-60)** 2 / 60) = 40
# Instead, here is what should happen
import numpy as np
fobs = np.array([10,20])
fexp = np.array([30,60])
# adjust the totals
# gives array([10., 20.]), the same as observed
fexp = fexp * (np.sum(fobs)/np.sum(fexp))
chisquare(f_obs=fobs, f_exp=fexp)
# the correct result
# Power_divergenceResult(statistic=0.0, pvalue=1.0)
Scipy/Numpy/Python version information:
1.4.1 1.18.4 sys.version_info(major=3, minor=7, micro=4, releaselevel='final', serial=0)
Issue Analytics
- State:
- Created 3 years ago
- Comments:15 (9 by maintainers)
Top Results From Across the Web
Chi Squared Analysis on Data sets that don't have matching ...
I can't seem to understand why they have to add up to the same total - are there any ways I can run...
Read more >scipy.stats.chisquare — SciPy v1.9.3 Manual
A typical rule is that all of the observed and expected frequencies should be at least 5. According to [3], the total number...
Read more >Hypothesis Testing - Chi Squared Test - SPH - Boston University
Notice that the expected frequencies are taken to one decimal place and that the sum of the observed frequencies is equal to the...
Read more >Chi-square - Python for Data Science - WordPress.com
While we check the results of the chi 2 test, we need also to check that the expected cell frequencies are greater than...
Read more >Python for Data 25: Chi-Squared Tests - Kaggle
The chi-squared goodness-of-fit test is an analog of the one-way t-test for categorical variables: it tests whether the distribution of sample categorical data ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
AFAIU changing this was a mistake. The chi-square test as described on Wikipedia https://en.wikipedia.org/wiki/Chi-squared_test does not require that the sum of the observations is equal to the sum of expectations. It is derived on the Wikipedia under the assumption that sum(x) == sum(m), but even without enforcing this condition, if the x are sampled from the m, the statistic has a chi-square distribution in the asymptotic limit. The original implementation was correct and breaking everyone’s code that relied on the previous behavior is not great.
This is simply wrong.
The problem is rather the documentation of the chisquare function, which speaks of
f_obs
andf_exp
, but the function did not accept frequencies but counts. Frequencies must sum to 1, but counts do not have to.https://hometone.com/apple-sofa-steve-jobs-craving-heavens.html
😃