question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Overflow Error from binned_statistic_dd

See original GitHub issue

Error message:

---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-86-0811ef0d48f7> in <module>()
     16         sl = SpectrumList([x for x in spectrum_list if x.id in ids])
     17         print sl.tolist()
---> 18         sl_df = make_spectrumlist(sl)
     19 
     20         pls_df = pd.concat([sl_df, mtd["Diagnosis"]], axis=1).dropna(axis=0)

<ipython-input-63-51c201b1e4f9> in make_spectrumlist(sl)
      1 def make_spectrumlist(sl):
----> 2     sl.binning(bin_size=0.5, mass_statistic="max")
      3     sl.value_imputation(method="basic", threshold=0.1)
      4     sl.normalise(method="tic")
      5     sl.transform(method="log10")

/home/keo7/ProjectsNew/metabolomics/denisa-saliva/env/local/lib/python2.7/site-packages/dimepy/SpectrumList.pyc in binning(self, bin_size, int_statistic, mass_statistic, inplace)
    174 
    175         for spectrum in self._spectrum:
--> 176             bins, intensities = _apply_binning(spectrum, bins)
    177             if inplace:
    178                 spectrum.masses = bins

/home/keo7/ProjectsNew/metabolomics/denisa-saliva/env/local/lib/python2.7/site-packages/dimepy/SpectrumList.pyc in _apply_binning(spectrum, bins)
    160                 spectrum.intensities,
    161                 bins=bins,
--> 162                 statistic=int_statistic)
    163             indx = np.invert(np.isnan(b_i))
    164 

/home/keo7/ProjectsNew/metabolomics/denisa-saliva/env/local/lib/python2.7/site-packages/scipy/stats/_binned_statistic.pyc in binned_statistic(x, values, statistic, bins, range)
    177 
    178     medians, edges, binnumbers = binned_statistic_dd(
--> 179         [x], values, statistic, bins, range)
    180 
    181     return BinnedStatisticResult(medians, edges[0], binnumbers)

/home/keo7/ProjectsNew/metabolomics/denisa-saliva/env/local/lib/python2.7/site-packages/scipy/stats/_binned_statistic.pyc in binned_statistic_dd(sample, values, statistic, bins, range, expand_binnumbers)
    534     for i in xrange(Ndim):
    535         # Find the rounding precision
--> 536         decimal = int(-np.log10(dedges[i].min())) + 6
    537         # Find which points are on the rightmost edge.
    538         on_edge = np.where(np.around(sample[:, i], decimal) ==

OverflowError: cannot convert float infinity to integer

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:2
  • Comments:10 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
rlucas7commented, Aug 13, 2019

Looked at this a little more today because I wasn’t satisfied and the error wasn’t fully reproduced. Looking at the stacktrace from @KeironO it looks like one of the edges of the bins was numerically 0. For example, if I run:

c4b301c872df:~ rlucas$ python3
Python 3.6.2 |Anaconda custom (64-bit)| (default, Sep 21 2017, 18:29:43) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> int(-np.log10(0.0)) + 6
__main__:1: RuntimeWarning: divide by zero encountered in log10
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: cannot convert float infinity to integer
>>> 

That reproduces the last part of the stack trace posted. The issue is that a 0 goes into:

https://github.com/scipy/scipy/blob/master/scipy/stats/_binned_statistic.py#L583

which looks like gets produced from one of the calculations here:

https://github.com/scipy/scipy/blob/master/scipy/stats/_binned_statistic.py#L546-L567

AFAIK the bin calculation doesn’t restrict the bin edges to be non-zero. For this function the bin edges need to be non-zero for the logarithm.

Couple of ways to fix this part, either check for 0 bin and shift slightly or make the bin calculation not return a bin edge that is numerically 0.

The check and shift is easier to implement and would result in slightly different bin widths. The fix in the bin calculation is more vague, could shift all bins by an epsilon to not have a 0 bin edge would be the most straightforward way to fix that part.

0reactions
rlucas7commented, Aug 15, 2019

Quick status update, I’ve got a reproducing test and throwing ValueError for np.nan and np.inf I’m still working on the reproducing example and fix on the 0 bin edge defect. I expect to have a PR (with both tests and fixes) up by the weekend.

Read more comments on GitHub >

github_iconTop Results From Across the Web

python - Standard deviation of binned values with `scipy.stats ...
When I bin my data accordingly to scipy.stats.binned_statistic (see here for example), how do I get the error (that is the standard ...
Read more >
Combining errors in a histogram (binned data) - Cross Validated
I'm processing some data that requires binning before it goes through a regression algorithm. The script is in Python and uses the Numpy ......
Read more >
scipy.stats.binned_statistic — SciPy v1.9.3 Manual
This function allows the computation of the sum, mean, median, or other statistic of the values (or set of values) within each bin....
Read more >
Histograms - ROOT
Statistical tests; Histogram bin Errors ... The last bin (bin# nbins+1 ) contains the overflow. A global bin number is defined to access...
Read more >
TH1 Class Reference - ROOT
This global gbin is useful to access the bin content/error information ... During filling, some statistics parameters are incremented to compute the mean ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found