Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] Possible bug in variational strategy

See original GitHub issue

🐛 Bug

I’m reviewing the Variational Strategy code and there may be a
bug in how the predictive mean is calculated. Alternatively
(and more likely) I am not understanding the logic.

Conern starts here:

    L = self._cholesky_factor(induc_induc_covar)
    if L.shape != induc_induc_covar.shape:
        # Aggressive caching can cause nasty shape incompatibilies when evaluating with different batch shapes
        # TODO: Use a hook fo this
        pop_from_cache(self, "cholesky_factor")
        L = self._cholesky_factor(induc_induc_covar)
    interp_term = L.inv_matmul(induc_data_covar.double()).to(full_inputs.dtype)

    # Compute the mean of q(f)
    # k_XZ K_ZZ^{-1/2} (m - K_ZZ^{-1/2} \mu_Z) + \mu_X
    predictive_mean = (
        torch.matmul(
            interp_term.transpose(-1, -2), (inducing_values - self.prior_distribution.mean).unsqueeze(-1)
        ).squeeze(-1)
        + test_mean
    )

Trying to maintain notation with the documentation provided,
I believe we want to calculate: K_{XZ}K_{ZZ}^{-1}(u - \mu_u) =
K_{XZ}L_{ZZ}^{-T}L_{ZZ}^{-1}(u - \mu_u)

From my understanding, the interp_term is giving us
K_{XZ}L_{ZZ}^{-T} and I think we’re missing a second
L_{ZZ}^{-1} term.

Apologies if I’m just missing something or got my equations
wrong!

Issue Analytics

State:
Created 3 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

2reactions

jacobrgardnercommented, Sep 23, 2020

The variational mean is the (variational) posterior mean that we learn for u. There is still a prior mean for u (and f).

1reaction

jacobrgardnercommented, Sep 23, 2020

VariationalStrategy does not follow equation 18 from that paper. UnwhitenedVariationalStrategy does.

VariationalStrategy applies the vartiaional parameters in a transformed space that is generally easier to work with (a technique commonly known as whitening). See, for example:

M. Kuss and C. E. Rasmussen. Assessing approximate inference for binary Gaussian process classification. Journal of Machine Learning Research, 6(Oct):1679–1704, 2005.

A. G. d. G. Matthews. Scalable Gaussian process inference using variational methods. PhD thesis, University of Cambridge, 2017.

The computation involving \mu_Z and \mu_X account for the fact that equation 18 directly assumes a 0 mean Gaussian process prior, while in code you may have any sort of prior mean function.

Top Results From Across the Web

Variational Multitask GP with correlated outputs #1035 - GitHub

I just realized that I'm getting this exact same error in #1041 with the variational stuff. Potentially offending line in the source code....

42 Variability Bugs in the Linux Kernel: A Qualitative Analysis

The objective of this work is to understand the complexity and nature of variability bugs (including feature interaction bugs) occurring in a large...

Severity Prediction for Bug Reports Using Multi-Aspect Features

In Bugzilla, the severity of each bug report is classified according to seven categories: Blocker, Critical, Major, Normal, Minor, Trivial, and Enhancement. ...

based models using NIMBLE - Berkeley Statistics

Customized sampling possible in NIMBLE greatly improves performance. • BUGS gives similar performance to the default NIMBLE MCMC. • Be careful – values...

Escaping from the Barren Plateau via Gaussian Initializations ...

We propose a Gaussian initialization strategy addressing the vanishing gradient problem in variational quantum circuits with theoretical ...