[Bug] Possible bug in variational strategy
See original GitHub issue🐛 Bug
I’m reviewing the Variational Strategy code and there may be a
bug in how the predictive mean is calculated. Alternatively
(and more likely) I am not understanding the logic.
Conern starts here:
L = self._cholesky_factor(induc_induc_covar)
if L.shape != induc_induc_covar.shape:
# Aggressive caching can cause nasty shape incompatibilies when evaluating with different batch shapes
# TODO: Use a hook fo this
pop_from_cache(self, "cholesky_factor")
L = self._cholesky_factor(induc_induc_covar)
interp_term = L.inv_matmul(induc_data_covar.double()).to(full_inputs.dtype)
# Compute the mean of q(f)
# k_XZ K_ZZ^{-1/2} (m - K_ZZ^{-1/2} \mu_Z) + \mu_X
predictive_mean = (
torch.matmul(
interp_term.transpose(-1, -2), (inducing_values - self.prior_distribution.mean).unsqueeze(-1)
).squeeze(-1)
+ test_mean
)
Trying to maintain notation with the documentation provided,
I believe we want to calculate: K_{XZ}K_{ZZ}^{-1}(u - \mu_u) =
K_{XZ}L_{ZZ}^{-T}L_{ZZ}^{-1}(u - \mu_u)
From my understanding, the interp_term
is giving us
K_{XZ}L_{ZZ}^{-T} and I think we’re missing a second
L_{ZZ}^{-1} term.
Apologies if I’m just missing something or got my equations
wrong!
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Variational Multitask GP with correlated outputs #1035 - GitHub
I just realized that I'm getting this exact same error in #1041 with the variational stuff. Potentially offending line in the source code....
Read more >42 Variability Bugs in the Linux Kernel: A Qualitative Analysis
The objective of this work is to understand the complexity and nature of variability bugs (including feature interaction bugs) occurring in a large...
Read more >Severity Prediction for Bug Reports Using Multi-Aspect Features
In Bugzilla, the severity of each bug report is classified according to seven categories: Blocker, Critical, Major, Normal, Minor, Trivial, and Enhancement. ...
Read more >based models using NIMBLE - Berkeley Statistics
Customized sampling possible in NIMBLE greatly improves performance. • BUGS gives similar performance to the default NIMBLE MCMC. • Be careful – values...
Read more >Escaping from the Barren Plateau via Gaussian Initializations ...
We propose a Gaussian initialization strategy addressing the vanishing gradient problem in variational quantum circuits with theoretical ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The variational mean is the (variational) posterior mean that we learn for u. There is still a prior mean for u (and f).
VariationalStrategy
does not follow equation 18 from that paper.UnwhitenedVariationalStrategy
does.VariationalStrategy
applies the vartiaional parameters in a transformed space that is generally easier to work with (a technique commonly known as whitening). See, for example:M. Kuss and C. E. Rasmussen. Assessing approximate inference for binary Gaussian process classification. Journal of Machine Learning Research, 6(Oct):1679–1704, 2005.
A. G. d. G. Matthews. Scalable Gaussian process inference using variational methods. PhD thesis, University of Cambridge, 2017.
The computation involving
\mu_Z
and\mu_X
account for the fact that equation 18 directly assumes a 0 mean Gaussian process prior, while in code you may have any sort of prior mean function.