Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Standard deviation and it's place in the code base

See original GitHub issue

I brought this up at a previous meeting and would appreciate input from more people as well. Essentially the issue is that most place in the codebase that refer to things as standard deviation (for example in the Data Class) it is not actually standard deviation. It is more akin to the coefficient of variation.

The Problem

For example in https://github.com/simpeg/simpeg/blob/1548d788e8e19f1a5ec8b6264770bbb5374ae3c0/SimPEG/data.py#L24 the data class has three attributes related to the noise level: standard_deviation, noise_floor, and uncertainty.

The standard_deviation in this class is actually a ratio multiplied by the absolute value of the data, which is then added to noise_floor, to produce the uncertainty,

uncertainty = standard_deviation x np.abs(d_obs) + noise_floor

In a statistical sense, this uncertainty would actually be the data’s standard deviation.

Thus lies the problem, standard_deviation does not actually refer to standard deviation. It actually comes from colloquially referring to a data as having “10%” standard deviation.

Suggestions

The definition of noise_floor seems to be self explanatory. I would suggest then renaming uncertainty to actually be standard_deviation (since that is what it is).

Questions

Therefore, we need a better name for what standard_deviation was, possibly noise_ratio, relative_error, percent_error, etc.? I’d like to open this up to discussion and suggestions.

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:9 (8 by maintainers)

Top GitHub Comments

2reactions

domfourniercommented, Feb 20, 2020

First choice: percent_error, second choice: relative_error. Not a big deal if we just document it and/or throw a warning if we detect all values >>1. I think it’s more along with the notation that Doug et al. have been using, and sort of implies a multiplication.

But deep inside I think we should scrap this whole business of two values doing computation in the background and just force people to assign directly uncertainty.

1reaction

leonfokscommented, Feb 13, 2020

2 cents. In geobipy that percentage multiplier is referred to as “relative”, and noise_floor is “additive”. The standard deviation is sqrt((relative*dobs)^2 + additive^2)

Top Results From Across the Web

C Program to Calculate Standard Deviation - Programiz

Note: The program calculates the standard deviation of a population. If you need to find the standard deviation of a sample, the formula...

Standard deviation - Wikipedia

The standard deviation of a random variable, sample, statistical population, data set, or probability distribution is the square root of its variance. It...

Standard Deviation - Formula, Definition, Methods, Examples

Standard deviation is commonly abbreviated as SD and denoted by 'σ' and it tells about the value that how much it has deviated...

Standard Deviation (STDDev) in Performance Monitoring

The standard deviation (STDDev) response time value is used in reports to provide greater depth of analysis. It shows how much variation there...

How to Find Standard Deviation in R? - DigitalOcean

In simple words the formula is defined as - Standard deviation is the square root of the 'variance'.