Overflow Error from binned_statistic_dd
See original GitHub issueError message:
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
<ipython-input-86-0811ef0d48f7> in <module>()
16 sl = SpectrumList([x for x in spectrum_list if x.id in ids])
17 print sl.tolist()
---> 18 sl_df = make_spectrumlist(sl)
19
20 pls_df = pd.concat([sl_df, mtd["Diagnosis"]], axis=1).dropna(axis=0)
<ipython-input-63-51c201b1e4f9> in make_spectrumlist(sl)
1 def make_spectrumlist(sl):
----> 2 sl.binning(bin_size=0.5, mass_statistic="max")
3 sl.value_imputation(method="basic", threshold=0.1)
4 sl.normalise(method="tic")
5 sl.transform(method="log10")
/home/keo7/ProjectsNew/metabolomics/denisa-saliva/env/local/lib/python2.7/site-packages/dimepy/SpectrumList.pyc in binning(self, bin_size, int_statistic, mass_statistic, inplace)
174
175 for spectrum in self._spectrum:
--> 176 bins, intensities = _apply_binning(spectrum, bins)
177 if inplace:
178 spectrum.masses = bins
/home/keo7/ProjectsNew/metabolomics/denisa-saliva/env/local/lib/python2.7/site-packages/dimepy/SpectrumList.pyc in _apply_binning(spectrum, bins)
160 spectrum.intensities,
161 bins=bins,
--> 162 statistic=int_statistic)
163 indx = np.invert(np.isnan(b_i))
164
/home/keo7/ProjectsNew/metabolomics/denisa-saliva/env/local/lib/python2.7/site-packages/scipy/stats/_binned_statistic.pyc in binned_statistic(x, values, statistic, bins, range)
177
178 medians, edges, binnumbers = binned_statistic_dd(
--> 179 [x], values, statistic, bins, range)
180
181 return BinnedStatisticResult(medians, edges[0], binnumbers)
/home/keo7/ProjectsNew/metabolomics/denisa-saliva/env/local/lib/python2.7/site-packages/scipy/stats/_binned_statistic.pyc in binned_statistic_dd(sample, values, statistic, bins, range, expand_binnumbers)
534 for i in xrange(Ndim):
535 # Find the rounding precision
--> 536 decimal = int(-np.log10(dedges[i].min())) + 6
537 # Find which points are on the rightmost edge.
538 on_edge = np.where(np.around(sample[:, i], decimal) ==
OverflowError: cannot convert float infinity to integer
Issue Analytics
- State:
- Created 5 years ago
- Reactions:2
- Comments:10 (9 by maintainers)
Top Results From Across the Web
python - Standard deviation of binned values with `scipy.stats ...
When I bin my data accordingly to scipy.stats.binned_statistic (see here for example), how do I get the error (that is the standard ...
Read more >Combining errors in a histogram (binned data) - Cross Validated
I'm processing some data that requires binning before it goes through a regression algorithm. The script is in Python and uses the Numpy ......
Read more >scipy.stats.binned_statistic — SciPy v1.9.3 Manual
This function allows the computation of the sum, mean, median, or other statistic of the values (or set of values) within each bin....
Read more >Histograms - ROOT
Statistical tests; Histogram bin Errors ... The last bin (bin# nbins+1 ) contains the overflow. A global bin number is defined to access...
Read more >TH1 Class Reference - ROOT
This global gbin is useful to access the bin content/error information ... During filling, some statistics parameters are incremented to compute the mean ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

Looked at this a little more today because I wasn’t satisfied and the error wasn’t fully reproduced. Looking at the stacktrace from @KeironO it looks like one of the edges of the bins was numerically 0. For example, if I run:
That reproduces the last part of the stack trace posted. The issue is that a 0 goes into:
https://github.com/scipy/scipy/blob/master/scipy/stats/_binned_statistic.py#L583
which looks like gets produced from one of the calculations here:
https://github.com/scipy/scipy/blob/master/scipy/stats/_binned_statistic.py#L546-L567
AFAIK the bin calculation doesn’t restrict the bin edges to be non-zero. For this function the bin edges need to be non-zero for the logarithm.
Couple of ways to fix this part, either check for 0 bin and shift slightly or make the bin calculation not return a bin edge that is numerically 0.
The check and shift is easier to implement and would result in slightly different bin widths. The fix in the bin calculation is more vague, could shift all bins by an epsilon to not have a 0 bin edge would be the most straightforward way to fix that part.
Quick status update, I’ve got a reproducing test and throwing ValueError for
np.nanandnp.infI’m still working on the reproducing example and fix on the 0 bin edge defect. I expect to have a PR (with both tests and fixes) up by the weekend.