Notes on errorbar enhancements
See original GitHub issueHere are some notes on planned changes to error bar specification.
Currently, what the error bars show is controlled through the ci
parameter. This can be either a number, setting the width of a bootstrap confidence interval, or the string "sd"
, indicating that the error var covers +/- the standard deviation of the data around the estimate value.
Some problems with this have been routinely noted:
- There’s no option for parametric confidence intervals/standard error
- There’s no option for showing a measure of data spread other than +/- 1 sigma
ci="sd"
does not really make conceptual sense as a parametrizing (it is the result of a short-sighted API decision)
In effect, you can think of the options as having a 2D taxonomy defined by whether the error bars show a measure of estimate certainty or data spread and whether the computation is parametric or nonparametric. Currently, we occupy two cells in this matrix:
Estimate certainty | Data spread | |
---|---|---|
Parametric | ci="sd" |
|
Nonparametric | ci=95 , ci=68 , etc. |
I would like to fill out the matrix. But we have a few challenges:
- As mentioned, the current API is not great, and overloading the meaning of
ci
further is a nonstarter - There is no centralized location in the code where this parameter is interpreted and used
Plans for the new API will involve a new parameter, probably called errorbar
but possibly error
, errbar
or some other shorthand, that accepts a tuple of the form (kind, level)
. The first element determines what the error bars show, and the second parametrizes them. One proposal is to fill out the space like this:
Estimate certainty | Data spread | |
---|---|---|
Parametric | ("se", scale) |
("sd", scale) |
Nonparametric | ("ci", size) |
("pi", size) |
IMO, there is a lot of sense to this. You have four options for kind
, each named using a bigram initialism. There are two kinds of level parameters:
scale
: multiplicatively scales a parametric error metric (e.g.("sd", 3)
gives you a 3-sigma error bar,("se", 1.96)
gives you a ~95% parametric confidence intervalsize
sets the size of a nonparametric interval with percentiles (of the boostrap distribution forci
and the input data forpi
) of(1 - size) / 2, 1 - (1 - size) / 2
There are also some potential drawbacks
"pi"
(i.e., “percentile interval”) doesn’t seem to be a commonly used term for a nonparametric measure of data spread. Actually I’m not sure there really is a term in the stats literature for such an interval, even though it’s a very reasonable thing to plot (e.g. #1501)- If you really want parametric 95% confidence intervals, this parametrization leaves you limited to a Z interval (and requires you to understand how to construct one from a standard error)
API decisions aside, the right implementation is going to take some thinking. Currently each module does its own errorbar computations. Most errorbars appear in the context of an aggregation-with-estimator operation. This can likely be abstracted. The other place they show up is in the regression module, where error bars are shown around the regression line. This needs to be handled differently, but statsmodels now has the get_prediction
method which will do a lot of the work for us. We’ll need a general enough implementation such that we can handle special cases (like logistic regression, where the SE/SD scaling should happen in logit space).
Here are some assorted open questions
- Should we accept simple strings (e.g.
errorbar="sd"
) with a default level value used internally? - This simple 4 option system is still fairly limiting; it may disappoint those who would like to be able to use a generic function to get error bars (e.g. #2332). What might that API look like?
- Is it a sensible API option for
sd
to correspond to the prediction interval in a regression model? - Should standard error correspond to the estimator and raise if the one used doesn’t have a defined standard error? In other words, what would we do with
estimator="median", errorbar="se"
? And if the estimator is a callable, should we use its name to associate with the correct standard error function? - It would be nice to have seaborn support multiple error bars from a sequence of
level
parameters, e.g. 1-2-3 sigma bands or 68-95-99 CIs (e.g. #1492). I like this kind of plot, but each plotting function will have to define its own logic for showing multiple error bars (e.g. layered alpha for error bands inlineplot
, lines of diminishing width inpointplot
). But still, if it’s going to happen, we should at least plan for it here. - What about additional arguments for bootstrapping (i.e.
n_boot
,seed
?) It would be nice to reduce the number of parameters in the main function signatures, but I would like to keep the argument forerrorbar
a simple tuple and not a more complex object that could take optional parameters. I think… - What about
loess
? (#552). Bootstrapping is still very slow, but statsmodels seems to still not have analytic confidence bands. - What’s the right order of operations for working on this? It should probably not be (fully) implemented until the categorical/regression modules can be refactored to use the core objects (where this should be handled).
Issue Analytics
- State:
- Created 3 years ago
- Reactions:5
- Comments:6 (6 by maintainers)
Top GitHub Comments
Oh I guess the point is that if you make it easy to add error bars at specific values, users can just do their bespoke computation externally to seaborn, obviating the need for accepting a generic function.
Hmm, it’s puzzling indeed that nobody talks about “percentile intervals.”
Perhaps it’s more common to be interested in assymetric data quantiles rather than confidence bounds? In practice, people definitely do plot this sort of thing, I’ve just seen it explicitly called “the 5th and 95th percentiles” rather than “the 90th percentile intervals”.