question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: power_divergence is raising on master when it does not in 1.6 or earlier

See original GitHub issue

My issue is about a bug appearing in the unreleased SciPy 1.7.0.dev0+9b9f2e8. This is appearing in the statsmodels pip-pre run.

Reproducing code example:

from numpy import array
from scipy import stats


f_obs = array([44, 74, 48, 24,  8,  2])
f_exp = array([43.93623589, 71.60431015, 52.20297275, 23.22015199,  7.16316534,
        1.59334208])
chi2 = stats.chisquare(f_obs , f_exp )

Error message:

>               raise ValueError(msg)
E  ValueError: For each axis slice, the sum of the observed frequencies must agree with the sum of the expected 
E frequencies to a relative tolerance of 1e-08, but the percent differences are:
E  0.0014010692380163618

Scipy/Numpy/Python version information:

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:34 (20 by maintainers)

github_iconTop GitHub Comments

1reaction
rkerncommented, Nov 8, 2022

Your implementation is just incorrect. Even in the book that you cite, it deals with the truncation. I also think you are missing a (1-exp) term. See the section on the Gap test in Knuth’s Seminumerical Algorithms, probably the most authoritative source, for the correct formulae.

1reaction
josef-pktcommented, Nov 8, 2022

A guess based on a quick code check

The skidmarks.gap_test function truncates the array of expected observations. So, AFAICS, egaps should not add up to one

    egaps = [l * (exp ** ii) for ii in range(1, len(ogaps) + 1)]
    chi, pval = chisquare(np.array(ogaps), np.array(egaps))

In statsmodels I had a similar problem. I truncated expected counts eg. for poisson, which was negligible before the change in scipy and raised after for some cases. My fix, AFAIR, was to add the missing probability to the last count, which also makes sense for chisquare test if truncation is even of a nonnegligible amount.

Also chisquare test requires that expected count in each cell is large enough. So relying on tiny cells is not good for pvalues either.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Evolution Final Flashcards - Quizlet
Study with Quizlet and memorize flashcards containing terms like Homo erectus, Nariokotome ("Fossil Boy"), Homo Habilis ( OH) and more.
Read more >
How to get meaningful and correct results from your finite ...
1.6 -9. If elements are distorting strongly, remeshing may improve the shape of the elements and the solution quality. For this, solution ...
Read more >
Predominance of cis-regulatory changes in parallel ... - eLife
We hypothesized that selection for divergent adaptation to marine and freshwater habitats could drive parallel divergence in gene expression in ...
Read more >
A molecular timescale for eukaryote evolution with ... - Nature
Addressing this important issue requires a reliable timeframe for eukaryote evolution, which has been challenging to obtain owing to a ...
Read more >
Predominance of cis-regulatory changes in parallel ... - NCBI
We ask whether cis- or trans- regulatory changes predominate in the early stages of adaptive divergence with gene flow, and, by comparisons across...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found