question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add a way to plot credible intervals with plot_ppc

See original GitHub issue

Tell us about it

plot_ppc is very nice, you can control the kind of plot and also the number of posterior predictive lines to draw with num_pp_samples. I would like to ask if it would be possible to add an option to plot credible intervals instead of single draws from the posterior predictive. Consider the following example:

import numpy as np
import arviz as az
import pymc3 as pm
from matplotlib import pyplot as plt

_x = np.random.uniform(-5, 5, 100)
_m = 1.5
_b = -0.7
_obs = np.random.normal(x * _m + _b, 1)

with pm.Model():
    x = pm.Data("x", x)
    m = pm.Normal("m", 0, 5)
    b = pm.Normal("b", 0, 5)
    obs = pm.Normal("obs", x * m + b, 1, observed=_obs)

    idata = pm.sample(return_inferencedata=True)
    ppc = pm.sample_posterior_predictive(idata)
    idata.extend(az.from_pymc3(posterior_predictive=ppc))

az.plot_ppc(idata);

The resulting plot is something like this image

All of the individual lines from the posterior predictive samples are quite hard to read, and it’s hard to make sense of how likely it is to find a sample in a given interval.

I would like to plot something like this:

ax = az.plot_dist(idata.observed_data.to_array(), color="black", label="Observed obs")
grid = np.linspace(*ax.get_xlim(), 1000)

# Get the ppc HDI
lines = []
for line in idata.posterior_predictive.to_array().values.reshape([-1, len(_obs)]):
    lines.append(stats.gaussian_kde(line)(grid))
lines = np.array(lines)
pdf = np.mean(lines, axis=0)
hdi = az.hdi(lines, hdi_prob=0.95)
az.plot_hdi(grid, hdi_data=hdi, color="C0", ax=ax, fill_kwargs={"label": "95% HDI"})
ax.plot(grid, pdf, color="C0", linestyle="--", label="Posterior predictive mean obs")
ax.legend();

image

where the filled in area is the posterior predictive’s HDI at a certain level.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
OriolAbrilcommented, Apr 10, 2021

Yes, I am testing that the whole kde line is inside the hdi shaded area, which is what I believe should be considered and the only clear and interpretable diagnostic. Again, in my opinion the ideal solution to this is using https://arxiv.org/abs/2103.10522 though, not spaghetti plots nor the hdi proposed here (nor variations on that to try and fix the hdi region to account for whole lines).

I do believe it could be useful to add this, but we have to be careful on that because it is no clear at all what exactly does the hdi shaded area represent nor how to interpret lines going outside that region.

My intuitive understanding of what an HDI for a PPC looks like is that if I have a 94% HDI, 94% of my KDE should fall inside the HDI on average if my model is solid.

I don’t think that’s true either and I am also quite sure about this. kdes are continuous lines, so the probability of the 2nd point being outside the region given the 1st one was outside is different that if the 1st one was inside, they are not independent values. Yet, we are calculating the hdi as if they were. Moreover, even if the hdi had this interpretation you mention, I don’t think having users estimate by themselves (and visually) if 95% or 90% of the kde line is outside the hdi region is a good idea.

As a side note on spaghetti plots, interpreting with an animation could be useful. Imagine the following situation. You start the animation with an spaghetti plot with 100 kde lines from the posterior predictive, then 10 more lines are added to the plot one by one, 9 come from the posterior predictive too and one is the one corresponding to the observations. You have to try and guess which is the kde of the observed data. Once the 10 lines have been added, the plot is updated to highlight the observed kde. If you guessed which one it is (and if in general anyone can guess) then your model is not reproducing the generative process correctly, if generally it’s not possible to know which is which your model is probably ok.

1reaction
OriolAbrilcommented, Apr 8, 2021

I think the ideal situation here would be to implement https://arxiv.org/abs/2103.10522 to compare several (or all) of the posterior predictive distributions to the observed one. The hdi of kde lines looks nice, but I don’t think there is any guarantee that kde lines of the same distribution will lie completely inside the shaded region with 95% probability.

We can definitely add the option for the kde hdi shaded area but we have to be careful in how we document it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Plot interval estimates from MCMC draws - Stan
Plot central (quantile-based) posterior interval estimates from MCMC draws. ... The MCMC-overview page provides details on how to specify each these.
Read more >
Credible Intervals (CI) • bayestestR - GitHub Pages
Credible intervals are an important concept in Bayesian statistics. ... method = "ETI") # Plot the distribution and add the limits of the...
Read more >
Add Credible Intervals to each line - Stack Overflow
The best way to calculate the 95% CI is with the function hdi(x, ci = 0.95) ('HDInterval' package). I would like to make...
Read more >
Adding Confidence Intervals to Scatter Plot of Means in Excel ...
How to use a line chart at the basis for creating a "scatter" plot with custom confidence intervals around means.
Read more >
Adding confidence intervals to a scatter plot in Excel 2016
How to add confidence intervals around point estimates on a "scatter" plot. A scatter plot shows the relationship between two variables, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found