Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using SKIP - NaNs encountered matrix-vector multiplication

See original GitHub issue

We are trying to use SKIP from this notebook, since we wish to be able to use GP on a large dataset with +5 dimensions.

We have followed the steps in the notebook and have implemented the code below. It works most of the time, however sometimes the line output = self.model(X_train) results in the covariance matrix only containing NaN-values. This of course means that the training fails, when trying to run loss = -mll(output, y_train).

What might be the cause of this?

class GPRegressionModel(gpytorch.models.ExactGP):

    def __init__(self, X_train, y_train, likelihood, grid_size=100):
        super(GPRegressionModel, self).__init__(X_train, y_train, likelihood)
        X_dims = X_train.shape[1]
        self.mean_module = gpytorch.means.ConstantMean()
        self.base_covar_module = gpytorch.kernels.RBFKernel()
        self.covar_module = gpytorch.kernels.ProductStructureKernel(
            gpytorch.kernels.ScaleKernel(
                gpytorch.kernels.GridInterpolationKernel(self.base_covar_module, grid_size=grid_size, num_dims=1)
            ), num_dims=X_dims
        )

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)


class GPyRModelRBFSKIP(Model):

    def __init__(self,
                 likelihood: gpytorch.likelihoods.Likelihood,
                 training_iterations: int = 60,
                 do_standardise: bool = True):
        self.name = 'GPyR_RBF_SKIP'
        self.model = None
        self.likelihood = likelihood
        self.training_iterations = training_iterations
        Model.__init__(self, do_standardise)

    def fit(self, X_train, y_train, standardise: bool = True, verbose: bool = False, batch_size: int = 5_000):

        if self.do_standardise:
            self.update_standardisation_values(X_train, y_train)
            X_train, y_train = self.standardise(X_train, y_train)

        if len(X_train.shape) == 1:
            X_train = X_train.reshape(-1, 1)

        # Convert to tensor
        X_train = torch.tensor(X_train, dtype=torch.float32)
        y_train = torch.tensor(y_train, dtype=torch.float32)

        if torch.cuda.is_available():
            X_train = X_train.cuda()
            y_train = y_train.cuda()
        
        # self.model prepares the data by standardising ect.  
        self.model = GPRegressionModel(X_train, y_train, self.likelihood)

        if torch.cuda.is_available():
            self.model = self.model.cuda()
            self.likelihood = self.likelihood.cuda()

        # Train the model
        # Find optimal model hyperparameters
        self.model.train()
        self.likelihood.train()

        optimizer = torch.optim.Adam(self.model.parameters(), lr=0.1)  # Includes GaussianLikelihood parameters

        # "Loss" for GPs - the marginal log likelihood
        mll = gpytorch.mlls.ExactMarginalLogLikelihood(self.likelihood, self.model)

        # Training loop/function from SKIP notebook
        def train():
            iterator = tqdm.tqdm(range(self.training_iterations))
            for i in iterator:
                # Zero backprop gradients
                optimizer.zero_grad()
                with gpytorch.settings.use_toeplitz(False), gpytorch.settings.max_root_decomposition_size(30):
                    # Get output from model
                    output = self.model(X_train)
                    loss = -mll(output, y_train)
                    # Calc loss and backprop derivatives

                    loss.backward()
                optimizer.step()
                torch.cuda.empty_cache()

        # See dkl_mnist.ipynb for explanation of this flag
        with gpytorch.settings.use_toeplitz(True):
            train()

The Traceback we get:

Traceback (most recent call last):
  File "D:/OneDrive/OneDrive - Danmarks Tekniske Universitet/DTU-DESKTOP-FEQGP1B/6.semester/Bachelorprojekt/ticra/Bachelor_project_2022/Framework/SKIP_model_investigation.py", line 57, in <module>
    model.fit(X_train, y_train)
  File "D:\OneDrive\OneDrive - Danmarks Tekniske Universitet\DTU-DESKTOP-FEQGP1B\6.semester\Bachelorprojekt\ticra\Bachelor_project_2022\Framework\Models\Gaussian_Process_Regression\with_GPytorch\GPyR_RBF_SKIP.py", line 105, in fit
    train()
  File "D:\OneDrive\OneDrive - Danmarks Tekniske Universitet\DTU-DESKTOP-FEQGP1B\6.semester\Bachelorprojekt\ticra\Bachelor_project_2022\Framework\Models\Gaussian_Process_Regression\with_GPytorch\GPyR_RBF_SKIP.py", line 93, in train
    loss = -mll(output, y_train)
  File "C:\Users\Mølle\AppData\Local\Programs\Python\Python38\lib\site-packages\gpytorch\module.py", line 30, in __call__
    outputs = self.forward(*inputs, **kwargs)
  File "C:\Users\Mølle\AppData\Local\Programs\Python\Python38\lib\site-packages\gpytorch\mlls\exact_marginal_log_likelihood.py", line 62, in forward
    res = output.log_prob(target)
  File "C:\Users\Mølle\AppData\Local\Programs\Python\Python38\lib\site-packages\gpytorch\distributions\multivariate_normal.py", line 169, in log_prob
    inv_quad, logdet = covar.inv_quad_logdet(inv_quad_rhs=diff.unsqueeze(-1), logdet=True)
  File "C:\Users\Mølle\AppData\Local\Programs\Python\Python38\lib\site-packages\gpytorch\lazy\lazy_tensor.py", line 1334, in inv_quad_logdet
    inv_quad_term, logdet_term = func(
  File "C:\Users\Mølle\AppData\Local\Programs\Python\Python38\lib\site-packages\gpytorch\functions\_inv_quad_log_det.py", line 157, in forward
    solves, t_mat = lazy_tsr._solve(rhs, preconditioner, num_tridiag=num_random_probes)
  File "C:\Users\Mølle\AppData\Local\Programs\Python\Python38\lib\site-packages\gpytorch\lazy\lazy_tensor.py", line 674, in _solve
    return utils.linear_cg(
  File "C:\Users\Mølle\AppData\Local\Programs\Python\Python38\lib\site-packages\gpytorch\utils\linear_cg.py", line 188, in linear_cg
    raise RuntimeError("NaNs encountered when trying to perform matrix-vector multiplication")
RuntimeError: NaNs encountered when trying to perform matrix-vector multiplication

Version

GPyTorch version: 1.6.0 PyTorch version: 1.11.0+cu113

Issue Analytics

State:
Created a year ago
Comments:5

Top GitHub Comments

1reaction

KasperNiklascommented, Apr 25, 2022

@Balandat it now works when we use torch.double!

Thank you both for your replies 👍

0reactions

KasperNiklascommented, Apr 25, 2022

Hi @Balandat , This might be the issue we will look into that - however at first try we run out of memory when using torch.double.

Also thanks for your quick replies! It is awesome that the GPyTorch team is so quick at given responses 😃

Top Results From Across the Web

[Bug] RuntimeError: NaNs encountered matrix-vector ... - GitHub

Bug RuntimeError: NaNs encountered when trying to perform matrix-vector multiplication I am receiving the above error after 12 batches of ...

Numpy matrix multiplication returns nan - Stack Overflow

Save this question. Show activity on this post. I have two 2-D matrices, and I want to multiply these two matrices to get...

NaNs encountered when trying to perform matrix-vector ...

Hi, I'm using Gpytorch to implement a multi output regression, but i have an error when i try to use a Periodic kernel....

Slowing down matrix multiplication in R | Radford Neal's blog

The NaN check is actually slower than a multiply, so the overhead of the check will be negligible only if N+M is several...

MATLAB cumprod - Cumulative product - MathWorks

Ignore NaN values in the cumulative product calculation using the 'omitnan' ... Input array, specified as a vector, matrix, or multidimensional array.