Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

On confidence intervals and uncertainty intervals

See original GitHub issue

Given that this is a Bayesian method, it is strange that the uncertainty is summarized using confidence intervals as opposed to Bayesian uncertainty/probability/credibility intervals via, for instance, HDIs and also taking the mean instead of the median as the point estimate. @WillianFuks, I am curious to hear your thoughts on what is in compile_posterior_inferences:

https://github.com/WillianFuks/tfcausalimpact/blob/master/causalimpact/inferences.py#L52

Having computed samples of the target time series, it should be straightforward to summarize them pointwise via, say, hdi in arviz. Or do I miss something?

Issue Analytics

State:
Created 3 years ago
Comments:13 (6 by maintainers)

Top GitHub Comments

1reaction

IvanUkhovcommented, Mar 30, 2021

I ended up writing a custom function for summarizing inferences, as I wanted to have medians and HDIs instead of means and quantile intervals. I will leave it here in case it can be helpful some time in the future:

def _summarize_original(
    data_before: pd.DataFrame,
    data_after: pd.DataFrame,
    posterior: tp.distributions.Distribution,
    predictive: tp.distributions.Distribution,
    standardization: Tuple[float, float],
    alpha: float = 0.05,
    draw_count: int = 1000,
    random_state: int = 42,
) -> pd.DataFrame:
    from causalimpact.inferences import build_cum_index
    from causalimpact.misc import maybe_unstandardize

    def _hdi(data: np.ndarray) -> np.ndarray:
        from arviz.stats import hdi
        return np.array([hdi(data, 1 - alpha) for data in data.T]).T

    y_before = predictive.sample(draw_count, seed=random_state)
    y_before = maybe_unstandardize(np.squeeze(y_before.numpy()), standardization)
    y_after = posterior.sample(draw_count, seed=random_state)
    y_after = maybe_unstandardize(np.squeeze(y_after.numpy()), standardization)

    pre_preds_means = np.median(y_before, axis=0)
    pre_preds_lower, pre_preds_upper = _hdi(y_before)
    pre_preds_means = pd.Series(pre_preds_means, index=data_before.index)
    pre_preds_lower = pd.Series(pre_preds_lower, index=data_before.index)
    pre_preds_upper = pd.Series(pre_preds_upper, index=data_before.index)

    post_preds_means = np.median(y_after, axis=0)
    post_preds_lower, post_preds_upper = _hdi(y_after)
    post_preds_means = pd.Series(post_preds_means, index=data_after.index)
    post_preds_lower = pd.Series(post_preds_lower, index=data_after.index)
    post_preds_upper = pd.Series(post_preds_upper, index=data_after.index)

    complete_preds_means = pd.concat([pre_preds_means, post_preds_means])
    complete_preds_lower = pd.concat([pre_preds_lower, post_preds_lower])
    complete_preds_upper = pd.concat([pre_preds_upper, post_preds_upper])

    data = pd.concat([data_before, data_after])
    point_effects_means = data.iloc[:, 0] - complete_preds_means
    point_effects_upper = data.iloc[:, 0] - complete_preds_lower
    point_effects_lower = data.iloc[:, 0] - complete_preds_upper

    z_after = np.cumsum(data_after.iloc[:, 0].values - y_after, axis=1)
    post_cum_effects_means = np.median(z_after, axis=0)
    post_cum_effects_lower, post_cum_effects_upper = _hdi(z_after)
    index = build_cum_index(data_before.index, data_after.index)
    post_cum_effects_lower = pd.Series(
        np.concatenate([[0], post_cum_effects_lower]).ravel(),
        index=index,
    )
    post_cum_effects_means = pd.Series(
        np.concatenate([[0], post_cum_effects_means]).ravel(),
        index=index,
    )
    post_cum_effects_upper = pd.Series(
        np.concatenate([[0], post_cum_effects_upper]).ravel(),
        index=index,
    )

    data = dict(
        complete_preds_means=complete_preds_means,
        complete_preds_lower=complete_preds_lower,
        complete_preds_upper=complete_preds_upper,
        point_effects_means=point_effects_means,
        point_effects_lower=point_effects_lower,
        point_effects_upper=point_effects_upper,
        post_cum_effects_means=post_cum_effects_means,
        post_cum_effects_lower=post_cum_effects_lower,
        post_cum_effects_upper=post_cum_effects_upper,
    )
    return pd.DataFrame(data)

0reactions

IvanUkhovcommented, Mar 9, 2021

I close this then. I think switching to quantile intervals was a good move. Next time around, one can consider HDIs, but probably it would not make much of a difference unless very skewed distributions are expected.

Thank you, and sorry for this much noise here 🙂

Top Results From Across the Web

Are confidence intervals better termed “uncertainty intervals”?

Let's use the term “uncertainty interval” instead of “confidence interval.” The uncertainty interval tells us how much uncertainty we have. As ...

Difference between uncertainty intervals and confidence ...

An uncertainty interval refers to confidence interval, the difference between the two being only philosophical rather than mathematical. Confidence interval ...

Confidence intervals, compatability intervals, uncertainty ...

I recommended the term “uncertainty intervals,” on the grounds that the way confidence intervals are used in practice is to express ...

12.4.1 Confidence intervals

The confidence interval describes the uncertainty inherent in this estimate, and describes a range of values within which we can be reasonably sure...

Introduction to Uncertainty

Measurements and uncertainties. ➢Uncertainty. ➢Types of uncertainty. ➢Standard uncertainty. ➢Confidence Intervals. ➢Expanded Uncertainty. ➢Examples.