Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Consider interpolation HDI calculations

See original GitHub issue

Tell us about it

In Bambi we have a function called plot_cap that is used to obtain visualizations of the fitted curve. We overlay a credible interval so users can visualize the uncertainty around the mean estimate. Internally, we’re using az.hdi() to obtain the bounds. Today, I was implementing some improvements and found the plots look quite noisy. See the following examples

import arviz as az
import bambi as bmb
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
print(az.__version__)
# 0.14.0

The following is Bambi specific code, it’s not that important for what I want to show

data = pd.read_csv("https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv")
model = bmb.Model("mpg ~ 1 + hp", data)
idata = model.fit(random_seed=1234)

# Obtain predictiosn
new_data = pd.DataFrame({"hp": np.linspace(50, 320, 200)})
idata = model.predict(idata, data=new_data, inplace=False)
y_hat = idata.posterior["mpg_mean"]

Get the bands using az.hdi()

y_hat_bounds = az.hdi(y_hat, 0.94)["mpg_mean"].T.to_numpy()
fig, ax = plt.subplots(figsize=(7, 5), dpi=120)
ax.fill_between(new_data["hp"], y_hat_bounds[0], y_hat_bounds[1], alpha=0.5);

Get the bands using .quantile() in DataArray, which calls np.quantile under the hood (if I understood correctly)

y_hat_bounds = y_hat.quantile(q=(0.03, 0.97), dim=("chain", "draw"))
fig, ax = plt.subplots(figsize=(7, 5), dpi=120)
ax.fill_between(new_data["hp"], y_hat_bounds[0], y_hat_bounds[1], alpha=0.5);

Thoughts on implementation

I’m not aware of the historical details that led to the current implementation of az.hdi(). But I think it’s worth considering other alternatives since the current behavior returns very noisy results. I have other examples where it looks even worse, for example here

Tagging @aloctavodia because we talked about this via chat

Issue Analytics

State:
Created 10 months ago
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

OriolAbrilcommented, Dec 21, 2022

I don’t see the current issue title (consider using np.quantile for hdi) as viable, and instead see two underlying issues that need addressing.

The first is being able to use any function to generate intervals or bands. This is a know issue and we are working on it, but imo it requires refactoring the plots module. I have started some experiments at https://xrtist.readthedocs.io/en/latest/ for example.

The second is stabilising our current hdi approach in a manner similar to what np.quantile does. I hadn’t really realized the instability that comes with returning existing samples can actually be fixed with these interpolation methods. I don’t see much value in focusing this issue on the first thing, but I think it would be very helpful to focus on this second one

0reactions

tomicaprettocommented, Dec 21, 2022

@OriolAbril I think I understand why using quantiles would be misleading for HDIs, [P2.5%, P97.5%] doesn’t necessarily give a 95% HDI, but I don’t understand the name suggestion.