Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Negative Variances from predictive_gradients()

See original GitHub issue

I’m trying to get at the gradient of the predictions, but the variance of the estimate is coming back negative. Not sure if what I’m doing is wrong, but as a MWE, the following seems to demonstrate the issue:

import GPy
import numpy as np

# keep deviates deterministic (actual seed not important)
np.random.seed(123)

# draw from our function
X = np.random.uniform(-3.,3.,(20,1))
Y = np.sin(X) + np.random.randn(20,1)*0.05

# set up model
m = GPy.models.GPRegression(X, Y)

# optimise parameters (optional)
m.optimize()

# where do we want to predict our function
Xp = np.linspace(-4,4,51)[:,None]

# get our predictions of mean and gradient
mu,var = m.predict(Xp)
mug,varg = m.predictive_gradients(Xp)

# display variance in estimate of gradient?
print(varg)

I get back a number of negative values for the variance, which I’m not expecting… If I plot my test points and the function being fit they look fine:

gpy-dx-var

But the variance seems to have a lot of unexpected structure, i.e. doesn’t just look like numerical instability to me. Not sure if this matters, but I’m running without Weave (I’m under OS X and am currently struggling to get OMP code compiling) so don’t know if the C version is correct.

I’ve just started to use GPy, so sorry if this issue/question is malformed! While poking around I’ve just seen m._raw_predict() which looks useful as well. The underscore prefix makes it look as though it should be internal/private, is it recommended to be using this function?

Issue Analytics

State:
Created 8 years ago
Comments:8 (1 by maintainers)

Top GitHub Comments

1reaction

befelixcommented, Aug 17, 2015

The derivative of the mean should be the same for all kernel functions.

So I actually had to implement that myself for RBF functions. Couldn’t figure out an easy generic way for all kernels either, mostly because there doesn’t seem to be a way to get the second derivative of the kernel, which is needed for the prior.

Anyways, since your just interested in RBFs anyways, here’s my rbf-specific code…without any warranty or me saying that this is a good implementation in any way. There might be some way that is a lot better, I’m not involved in this project at all, just a user. If you find a bug please let me know.

def predict_derivatives(self, x_new):
    """
    Predict derivatives of a gp at the points x_new.

    Author: Felix Berkenkamp (befelix)

    Parameters:
    -----------
    self: instance of GPy.core.gp
        With RBF kernel function
    x_new: 2d-array
        Every row is a new data point at which to evaluate the derivative
derivative
    Returns:
    --------
    mu: 2d-array
        The mean derivative
    var: 2/3d-array
        If there is only one data point to predict var is a 2d array with
        the variance matrix. Otherwise it is a 3d matrix where
        var[i, :, :] is the ith covariance matrix.
    """
    if not self.kern.name == 'rbf':
        raise AttributeError('The variance prior only works for RBF kernels.')
    x_new = np.atleast_2d(x_new)

    # Compute mean, initialize variance
    mu = self.kern.gradients_X(self.posterior.woodbury_vector.T, x_new, self.X)
    var = np.empty((x_new.shape[0], x_new.shape[1], x_new.shape[1]),
                   dtype=mu.dtype)

    # Make sure lengthscales are of the right dimensions
    lengthscales = self.kern.lengthscale.values
    if not lengthscales.shape[0] == self.X.shape[1]:
        lengthscales = np.tile(lengthscales, (self.X.shape[1],))

    def dk_dx(X, X2):
        """Compute the derivative of k(X, X2) with respect to X."""

        # Derivative with respect to r
        dK_dr = self.kern.dK_dr_via_X(X, X2)

        # Temporary stuff
        tmp = self.kern._inv_dist(X, X2) * dK_dr

        # dK_dX = dK_dr * dr_dx
        # dr_dx1 = invdist * (x1 - x'1) / l1**2
        dk_dx = np.empty((X2.shape[0], X.shape[1]), dtype=np.float64)
        for q in range(self.input_dim):
            dk_dx[:, q] = tmp * (X[:, q, None] - X2[None, :, q])
        return dk_dx / (lengthscales ** 2)

    # Compute derivative variance for each test point
    for i in range(x_new.shape[0]):
        dk = dk_dx(x_new[None, i, :], self.X)
        # Would be great if there was a way to get the prior directly from the
        # library
        # But I think only d1 is implemented
        var[i, :, :] = np.diag(self.kern.variance / (lengthscales ** 2)) -\
            np.dot(dk.T, np.dot(self.posterior.woodbury_inv, dk))

    # If there was only one test point to begin with, squeeze the
    # corresponding dimension
    if x_new.shape[0] <= 1:
        var = var.squeeze(0)
    return mu, var

0reactions

jameshensmancommented, Aug 17, 2015

This bug arose as a misnderstanding of the predictive_gradients function. Have opened a new bug to request a function with the desired behaviour. #213

Top Results From Across the Web

Handling negative variances on the derivative of Gaussian ...

Is it valid to simply take the absolute value of this quantity when computing the error or should this variance be handled differently?...

Gradient Descent Algorithm and Its Variants | by Imad Dabbura

Optimization algorithm that is iterative in nature and converges to acceptable solution regardless of the parameters initialization such as ...

Vanishing/Exploding Gradients - Medium

There is a huge difference between the inputs and outputs of certain activation functions, such as the logistic (sigmoid) activation function. The variance...

Variance Partitioning and Commonality Analysis

variance explained by predictor variables in redundancy analysis (RDA), canonical ... fractions of a variable will be negative.

Gradient Descent — ML Glossary documentation

With a very low learning rate, we can confidently move in the direction of the negative gradient since we are recalculating it so...