numpy histogram precision bug
See original GitHub issueWhen dealing with weights with a large dynamic range, the result is problematic. The problem can be reproduced with this code in numpy 1.9.1 and 1.9.2:
sample_size=1e5
xmin,xmax=-10.,0.
data=np.random.rand(sample_size)*(xmax-xmin)+xmin #some uniform data
weight=np.exp(-4*data) #weights
y,x=np.histogram(data,50,weights=weight)
print y[-1]
print np.sum(weight[(data>x[-2])&(data<x[-1])])
This should print two identical numbers (which is the case if, for example, we use xmin=-1.
) but instead it produces:
0.0
2991.44627547
This is not a datatype issue because every variable is float64
. Replacing the bin number parameter with double precision bins does not help.
Issue Analytics
- State:
- Created 8 years ago
- Comments:7 (2 by maintainers)
Top Results From Across the Web
histogram misses values in matplotlib, bug? - Stack Overflow
This is because you are letting matplotlib automatically determine the bins for you by using plt.hist(values,10) because the second argument ...
Read more >Train With Mixed Precision - NVIDIA Documentation Center
Mixed precision methods combine the use of different numerical formats ... Consider the histogram of activation gradient values (shown with ...
Read more >NumPy 1.11.0 Release Notes
Automatic bin size estimation for np.histogram . ... The results should be marginally more accurate or outright bug fixes compared to the previous...
Read more >Fixing common date annoyances - Matplotlib
Matplotlib allows you to natively plots python datetime instances, ... for the location in the toolbar to have a higher degree of precision, ......
Read more >Release Notes — NumPy v1.15 Manual
#11760: BUG: Fixes for unicode field names in Python 2 ... numpy integer types, as well as the builtin arbitrary-precision Decimal and long ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Below is a quick work around:
Not sure about its efficiency but hopefully not much slower. Maybe this should replace the histogram algorithm.
There is no issue with
sample_size
, it is a long standing change that numpy expects you to put integers and1e5
is a float not an integer, so now you have to use10**5
instead in those instances, we know it is a bit annoying, but it is identical torange(1e5)
, etc. in python.