Unexpected results for the mean of a DataFrame of ufloat from the uncertainties package.
See original GitHub issueRelated to #6898.
I find it very convenient to use a DataFrame of ufloat
from the uncertainties
package. Each entry consists of (value, error) and could represent the result of Monte Carlo simulations or an experiment.
At present taking sums along both axes gives the expected result, but taking the mean does not.
import pandas as pd
import numpy as np
from uncertainties import unumpy
value = np.arange(12).reshape(3,4)
err = 0.01 * np.arange(12).reshape(3,4) + 0.005
data = unumpy.uarray(value, err)
df = pd.DataFrame(data, index=['r1', 'r2', 'r3'], columns=['c1', 'c2', 'c3', 'c4'])
Examples:
print (df)
c1 c2 c3 c4
r1 0.000+/-0.005 1.000+/-0.015 2.000+/-0.025 3.000+/-0.035
r2 4.00+/-0.04 5.00+/-0.06 6.00+/-0.07 7.00+/-0.08
r3 8.00+/-0.09 9.00+/-0.10 10.00+/-0.11 11.00+/-0.12
df.sum(axis=0) # This works
c1 12.00+/-0.10
c2 15.00+/-0.11
c3 18.00+/-0.13
c4 21.00+/-0.14
dtype: object
df.sum(axis=1) # This works
r1 6.00+/-0.05
r2 22.00+/-0.12
r3 38.00+/-0.20
dtype: object
df.mean(axis=0) # This does not work
Series([], dtype: float64)
Expected (`df.apply(lambda x: x.sum() / x.size)`)
c1 4.000+/-0.032
c2 5.00+/-0.04
c3 6.00+/-0.04
c4 7.00+/-0.05
dtype: object
df.mean(axis=1) # This does not work
r1 NaN
r2 NaN
r3 NaN
dtype: float64
Expected (`df.T.apply(lambda x: x.sum() / x.size)`)
r1 1.500+/-0.011
r2 5.500+/-0.031
r3 9.50+/-0.05
dtype: object
Issue Analytics
- State:
- Created 7 years ago
- Reactions:3
- Comments:15 (9 by maintainers)
Top Results From Across the Web
User Guide — uncertainties Python package 3.0.1 ...
The ufloat() function creates numbers with uncertainties. ... run with no or little modification and automatically produce results with uncertainties.
Read more >Unexpectedly long computation time with uncertainties package
If I try to run it on my computer it takes up to 10 minutes to produce a result. I'm not really sure...
Read more >Source code documentation - OMFIT
pythonFile – is meant to be an OMFITpythonGUI object in the OMFIT tree ... class returns unumpy.uarrays of Variable objects using the uncertainties...
Read more >Usage · Measurements - JuliaPhysics
measurement(value) creates a Measurement object with zero uncertainty, ... to define quantities with uncertainty, but can lead to unexpected results if used ...
Read more >Uncertainty propagation - Risk Engineering
The uncertainties package is able to do various types of arithmetic and other ... on these uncertain numbers, and propagates the uncertainty to...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Seen from the outside, it looks like in both cases Pandas decrees that the result of
mean()
should be of typefloat64
: in @rth’s example above the NumPy array actually contains integers, that are converted tofloat64
(which is doable); in the case ofuncertainties.UFloat
numbers with uncertainty, forcing the result tofloat64
is mostly meaningless (as this would get rid of the uncertainty) andmean()
does not produce the expected result.In contrast, as the original post shows, Pandas is more open on the data type of
sum()
, which is, correctly,object
, foruncertainties.UFloat
objects.I think that it is desirable that since Pandas is able to
sum()
, it be able to get themean()
too (since the mean is not much more than a sum).I just wanted to be sure that you’re not using subclassing or something else like that.
In any case, I think this is probably a pandas bug (but would need someone to work through/figure out). We should have a fallback implementation of
mean
(like NumPy’s mean) that works on object arrays.