Standard deviation and it's place in the code base
See original GitHub issueI brought this up at a previous meeting and would appreciate input from more people as well. Essentially the issue is that most place in the codebase that refer to things as standard deviation (for example in the Data Class) it is not actually standard deviation. It is more akin to the coefficient of variation.
The Problem
For example in https://github.com/simpeg/simpeg/blob/1548d788e8e19f1a5ec8b6264770bbb5374ae3c0/SimPEG/data.py#L24
the data class has three attributes related to the noise level:
standard_deviation
, noise_floor
, and uncertainty
.
The standard_deviation
in this class is actually a ratio multiplied by the absolute value of the data, which is then added to noise_floor
, to produce the uncertainty
,
uncertainty
= standard_deviation
x np.abs(d_obs)
+ noise_floor
In a statistical sense, this uncertainty
would actually be the data’s standard deviation.
Thus lies the problem, standard_deviation
does not actually refer to standard deviation. It actually comes from colloquially referring to a data as having “10%” standard deviation.
Suggestions
The definition of noise_floor
seems to be self explanatory. I would suggest then renaming uncertainty
to actually be standard_deviation
(since that is what it is).
Questions
Therefore, we need a better name for what standard_deviation
was, possibly noise_ratio
, relative_error
, percent_error
, etc.? I’d like to open this up to discussion and suggestions.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:9 (8 by maintainers)
Top GitHub Comments
First choice:
percent_error
, second choice:relative_error
. Not a big deal if we just document it and/or throw a warning if we detect all values >>1. I think it’s more along with the notation that Doug et al. have been using, and sort of implies a multiplication.But deep inside I think we should scrap this whole business of two values doing computation in the background and just force people to assign directly
uncertainty
.2 cents. In geobipy that percentage multiplier is referred to as “relative”, and noise_floor is “additive”. The standard deviation is sqrt((relative*dobs)^2 + additive^2)