question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Consider interpolation HDI calculations

See original GitHub issue

Tell us about it

In Bambi we have a function called plot_cap that is used to obtain visualizations of the fitted curve. We overlay a credible interval so users can visualize the uncertainty around the mean estimate. Internally, we’re using az.hdi() to obtain the bounds. Today, I was implementing some improvements and found the plots look quite noisy. See the following examples

import arviz as az
import bambi as bmb
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
print(az.__version__)
# 0.14.0

The following is Bambi specific code, it’s not that important for what I want to show

data = pd.read_csv("https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv")
model = bmb.Model("mpg ~ 1 + hp", data)
idata = model.fit(random_seed=1234)

# Obtain predictiosn
new_data = pd.DataFrame({"hp": np.linspace(50, 320, 200)})
idata = model.predict(idata, data=new_data, inplace=False)
y_hat = idata.posterior["mpg_mean"]

Get the bands using az.hdi()

y_hat_bounds = az.hdi(y_hat, 0.94)["mpg_mean"].T.to_numpy()
fig, ax = plt.subplots(figsize=(7, 5), dpi=120)
ax.fill_between(new_data["hp"], y_hat_bounds[0], y_hat_bounds[1], alpha=0.5);

image

Get the bands using .quantile() in DataArray, which calls np.quantile under the hood (if I understood correctly)

y_hat_bounds = y_hat.quantile(q=(0.03, 0.97), dim=("chain", "draw"))
fig, ax = plt.subplots(figsize=(7, 5), dpi=120)
ax.fill_between(new_data["hp"], y_hat_bounds[0], y_hat_bounds[1], alpha=0.5);

image

Thoughts on implementation

I’m not aware of the historical details that led to the current implementation of az.hdi(). But I think it’s worth considering other alternatives since the current behavior returns very noisy results. I have other examples where it looks even worse, for example here

image

Tagging @aloctavodia because we talked about this via chat

Issue Analytics

  • State:open
  • Created 10 months ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
OriolAbrilcommented, Dec 21, 2022

I don’t see the current issue title (consider using np.quantile for hdi) as viable, and instead see two underlying issues that need addressing.

The first is being able to use any function to generate intervals or bands. This is a know issue and we are working on it, but imo it requires refactoring the plots module. I have started some experiments at https://xrtist.readthedocs.io/en/latest/ for example.

The second is stabilising our current hdi approach in a manner similar to what np.quantile does. I hadn’t really realized the instability that comes with returning existing samples can actually be fixed with these interpolation methods. I don’t see much value in focusing this issue on the first thing, but I think it would be very helpful to focus on this second one

0reactions
tomicaprettocommented, Dec 21, 2022

@OriolAbril I think I understand why using quantiles would be misleading for HDIs, [P2.5%, P97.5%] doesn’t necessarily give a 95% HDI, but I don’t understand the name suggestion.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Data Reader's Guide | Human Development Reports
Countries and territories are ranked by 2019 Human Development Index (HDI) value. ... The 2014 Report introduced the GDI, which compares the HDI...
Read more >
Harmonic density interpolation methods for high-order ...
We presented a high-order kernel regularization method based on harmonic density interpolation. (HDI) for the numerical evaluation of integral ...
Read more >
HDR Technical Notes 0826 clean - the United Nations
The Human Development Index (HDI) is a summary measure of human development. It measures the average achievements in a country in three basic...
Read more >
HDI and three dimensions of indices were calculated based ...
HDI and three dimensions of indices were calculated based on Equation (1) from 1949 to 2018. For the health, education, and income indices, ......
Read more >
Gridded global datasets for Gross Domestic Product and ...
The scaling was completed by first calculating population weighted national HDI from sub-national HDI and the HYDE 3.2 population dataset. This ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found