question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

numpy histogram precision bug

See original GitHub issue

When dealing with weights with a large dynamic range, the result is problematic. The problem can be reproduced with this code in numpy 1.9.1 and 1.9.2:

sample_size=1e5
xmin,xmax=-10.,0.
data=np.random.rand(sample_size)*(xmax-xmin)+xmin #some uniform data
weight=np.exp(-4*data) #weights

y,x=np.histogram(data,50,weights=weight)
print y[-1]
print np.sum(weight[(data>x[-2])&(data<x[-1])])

This should print two identical numbers (which is the case if, for example, we use xmin=-1.) but instead it produces:

0.0
2991.44627547

This is not a datatype issue because every variable is float64. Replacing the bin number parameter with double precision bins does not help.

Issue Analytics

  • State:open
  • Created 8 years ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
Kambriancommented, Jul 9, 2015

Below is a quick work around:

bin_index=np.digitize(data, x)
yy=np.bincount(bin_index, weights=weight, minlength=len(bins)+1)[1:-1]
print yy[-1]

Not sure about its efficiency but hopefully not much slower. Maybe this should replace the histogram algorithm.

0reactions
sebergcommented, May 31, 2017

There is no issue with sample_size, it is a long standing change that numpy expects you to put integers and 1e5 is a float not an integer, so now you have to use 10**5 instead in those instances, we know it is a bit annoying, but it is identical to range(1e5), etc. in python.

Read more comments on GitHub >

github_iconTop Results From Across the Web

histogram misses values in matplotlib, bug? - Stack Overflow
This is because you are letting matplotlib automatically determine the bins for you by using plt.hist(values,10) because the second argument ...
Read more >
Train With Mixed Precision - NVIDIA Documentation Center
Mixed precision methods combine the use of different numerical formats ... Consider the histogram of activation gradient values (shown with ...
Read more >
NumPy 1.11.0 Release Notes
Automatic bin size estimation for np.histogram . ... The results should be marginally more accurate or outright bug fixes compared to the previous...
Read more >
Fixing common date annoyances - Matplotlib
Matplotlib allows you to natively plots python datetime instances, ... for the location in the toolbar to have a higher degree of precision, ......
Read more >
Release Notes — NumPy v1.15 Manual
#11760: BUG: Fixes for unicode field names in Python 2 ... numpy integer types, as well as the builtin arbitrary-precision Decimal and long ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found