Add a way to plot credible intervals with plot_ppc
See original GitHub issueTell us about it
plot_ppc
is very nice, you can control the kind
of plot and also the number of posterior predictive lines to draw with num_pp_samples
. I would like to ask if it would be possible to add an option to plot credible intervals instead of single draws from the posterior predictive. Consider the following example:
import numpy as np
import arviz as az
import pymc3 as pm
from matplotlib import pyplot as plt
_x = np.random.uniform(-5, 5, 100)
_m = 1.5
_b = -0.7
_obs = np.random.normal(x * _m + _b, 1)
with pm.Model():
x = pm.Data("x", x)
m = pm.Normal("m", 0, 5)
b = pm.Normal("b", 0, 5)
obs = pm.Normal("obs", x * m + b, 1, observed=_obs)
idata = pm.sample(return_inferencedata=True)
ppc = pm.sample_posterior_predictive(idata)
idata.extend(az.from_pymc3(posterior_predictive=ppc))
az.plot_ppc(idata);
The resulting plot is something like this
All of the individual lines from the posterior predictive samples are quite hard to read, and it’s hard to make sense of how likely it is to find a sample in a given interval.
I would like to plot something like this:
ax = az.plot_dist(idata.observed_data.to_array(), color="black", label="Observed obs")
grid = np.linspace(*ax.get_xlim(), 1000)
# Get the ppc HDI
lines = []
for line in idata.posterior_predictive.to_array().values.reshape([-1, len(_obs)]):
lines.append(stats.gaussian_kde(line)(grid))
lines = np.array(lines)
pdf = np.mean(lines, axis=0)
hdi = az.hdi(lines, hdi_prob=0.95)
az.plot_hdi(grid, hdi_data=hdi, color="C0", ax=ax, fill_kwargs={"label": "95% HDI"})
ax.plot(grid, pdf, color="C0", linestyle="--", label="Posterior predictive mean obs")
ax.legend();
where the filled in area is the posterior predictive’s HDI at a certain level.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Plot interval estimates from MCMC draws - Stan
Plot central (quantile-based) posterior interval estimates from MCMC draws. ... The MCMC-overview page provides details on how to specify each these.
Read more >Credible Intervals (CI) • bayestestR - GitHub Pages
Credible intervals are an important concept in Bayesian statistics. ... method = "ETI") # Plot the distribution and add the limits of the...
Read more >Add Credible Intervals to each line - Stack Overflow
The best way to calculate the 95% CI is with the function hdi(x, ci = 0.95) ('HDInterval' package). I would like to make...
Read more >Adding Confidence Intervals to Scatter Plot of Means in Excel ...
How to use a line chart at the basis for creating a "scatter" plot with custom confidence intervals around means.
Read more >Adding confidence intervals to a scatter plot in Excel 2016
How to add confidence intervals around point estimates on a "scatter" plot. A scatter plot shows the relationship between two variables, ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Yes, I am testing that the whole kde line is inside the hdi shaded area, which is what I believe should be considered and the only clear and interpretable diagnostic. Again, in my opinion the ideal solution to this is using https://arxiv.org/abs/2103.10522 though, not spaghetti plots nor the hdi proposed here (nor variations on that to try and fix the hdi region to account for whole lines).
I do believe it could be useful to add this, but we have to be careful on that because it is no clear at all what exactly does the hdi shaded area represent nor how to interpret lines going outside that region.
I don’t think that’s true either and I am also quite sure about this. kdes are continuous lines, so the probability of the 2nd point being outside the region given the 1st one was outside is different that if the 1st one was inside, they are not independent values. Yet, we are calculating the hdi as if they were. Moreover, even if the hdi had this interpretation you mention, I don’t think having users estimate by themselves (and visually) if 95% or 90% of the kde line is outside the hdi region is a good idea.
As a side note on spaghetti plots, interpreting with an animation could be useful. Imagine the following situation. You start the animation with an spaghetti plot with 100 kde lines from the posterior predictive, then 10 more lines are added to the plot one by one, 9 come from the posterior predictive too and one is the one corresponding to the observations. You have to try and guess which is the kde of the observed data. Once the 10 lines have been added, the plot is updated to highlight the observed kde. If you guessed which one it is (and if in general anyone can guess) then your model is not reproducing the generative process correctly, if generally it’s not possible to know which is which your model is probably ok.
I think the ideal situation here would be to implement https://arxiv.org/abs/2103.10522 to compare several (or all) of the posterior predictive distributions to the observed one. The hdi of kde lines looks nice, but I don’t think there is any guarantee that kde lines of the same distribution will lie completely inside the shaded region with 95% probability.
We can definitely add the option for the kde hdi shaded area but we have to be careful in how we document it.