question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using SKIP - NaNs encountered matrix-vector multiplication

See original GitHub issue

We are trying to use SKIP from this notebook, since we wish to be able to use GP on a large dataset with +5 dimensions.

We have followed the steps in the notebook and have implemented the code below. It works most of the time, however sometimes the line output = self.model(X_train) results in the covariance matrix only containing NaN-values. This of course means that the training fails, when trying to run loss = -mll(output, y_train).

What might be the cause of this?

class GPRegressionModel(gpytorch.models.ExactGP):

    def __init__(self, X_train, y_train, likelihood, grid_size=100):
        super(GPRegressionModel, self).__init__(X_train, y_train, likelihood)
        X_dims = X_train.shape[1]
        self.mean_module = gpytorch.means.ConstantMean()
        self.base_covar_module = gpytorch.kernels.RBFKernel()
        self.covar_module = gpytorch.kernels.ProductStructureKernel(
            gpytorch.kernels.ScaleKernel(
                gpytorch.kernels.GridInterpolationKernel(self.base_covar_module, grid_size=grid_size, num_dims=1)
            ), num_dims=X_dims
        )

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)


class GPyRModelRBFSKIP(Model):

    def __init__(self,
                 likelihood: gpytorch.likelihoods.Likelihood,
                 training_iterations: int = 60,
                 do_standardise: bool = True):
        self.name = 'GPyR_RBF_SKIP'
        self.model = None
        self.likelihood = likelihood
        self.training_iterations = training_iterations
        Model.__init__(self, do_standardise)

    def fit(self, X_train, y_train, standardise: bool = True, verbose: bool = False, batch_size: int = 5_000):

        if self.do_standardise:
            self.update_standardisation_values(X_train, y_train)
            X_train, y_train = self.standardise(X_train, y_train)

        if len(X_train.shape) == 1:
            X_train = X_train.reshape(-1, 1)

        # Convert to tensor
        X_train = torch.tensor(X_train, dtype=torch.float32)
        y_train = torch.tensor(y_train, dtype=torch.float32)

        if torch.cuda.is_available():
            X_train = X_train.cuda()
            y_train = y_train.cuda()
        
        # self.model prepares the data by standardising ect.  
        self.model = GPRegressionModel(X_train, y_train, self.likelihood)

        if torch.cuda.is_available():
            self.model = self.model.cuda()
            self.likelihood = self.likelihood.cuda()

        # Train the model
        # Find optimal model hyperparameters
        self.model.train()
        self.likelihood.train()

        optimizer = torch.optim.Adam(self.model.parameters(), lr=0.1)  # Includes GaussianLikelihood parameters

        # "Loss" for GPs - the marginal log likelihood
        mll = gpytorch.mlls.ExactMarginalLogLikelihood(self.likelihood, self.model)

        # Training loop/function from SKIP notebook
        def train():
            iterator = tqdm.tqdm(range(self.training_iterations))
            for i in iterator:
                # Zero backprop gradients
                optimizer.zero_grad()
                with gpytorch.settings.use_toeplitz(False), gpytorch.settings.max_root_decomposition_size(30):
                    # Get output from model
                    output = self.model(X_train)
                    loss = -mll(output, y_train)
                    # Calc loss and backprop derivatives

                    loss.backward()
                optimizer.step()
                torch.cuda.empty_cache()

        # See dkl_mnist.ipynb for explanation of this flag
        with gpytorch.settings.use_toeplitz(True):
            train()

The Traceback we get:

Traceback (most recent call last):
  File "D:/OneDrive/OneDrive - Danmarks Tekniske Universitet/DTU-DESKTOP-FEQGP1B/6.semester/Bachelorprojekt/ticra/Bachelor_project_2022/Framework/SKIP_model_investigation.py", line 57, in <module>
    model.fit(X_train, y_train)
  File "D:\OneDrive\OneDrive - Danmarks Tekniske Universitet\DTU-DESKTOP-FEQGP1B\6.semester\Bachelorprojekt\ticra\Bachelor_project_2022\Framework\Models\Gaussian_Process_Regression\with_GPytorch\GPyR_RBF_SKIP.py", line 105, in fit
    train()
  File "D:\OneDrive\OneDrive - Danmarks Tekniske Universitet\DTU-DESKTOP-FEQGP1B\6.semester\Bachelorprojekt\ticra\Bachelor_project_2022\Framework\Models\Gaussian_Process_Regression\with_GPytorch\GPyR_RBF_SKIP.py", line 93, in train
    loss = -mll(output, y_train)
  File "C:\Users\Mølle\AppData\Local\Programs\Python\Python38\lib\site-packages\gpytorch\module.py", line 30, in __call__
    outputs = self.forward(*inputs, **kwargs)
  File "C:\Users\Mølle\AppData\Local\Programs\Python\Python38\lib\site-packages\gpytorch\mlls\exact_marginal_log_likelihood.py", line 62, in forward
    res = output.log_prob(target)
  File "C:\Users\Mølle\AppData\Local\Programs\Python\Python38\lib\site-packages\gpytorch\distributions\multivariate_normal.py", line 169, in log_prob
    inv_quad, logdet = covar.inv_quad_logdet(inv_quad_rhs=diff.unsqueeze(-1), logdet=True)
  File "C:\Users\Mølle\AppData\Local\Programs\Python\Python38\lib\site-packages\gpytorch\lazy\lazy_tensor.py", line 1334, in inv_quad_logdet
    inv_quad_term, logdet_term = func(
  File "C:\Users\Mølle\AppData\Local\Programs\Python\Python38\lib\site-packages\gpytorch\functions\_inv_quad_log_det.py", line 157, in forward
    solves, t_mat = lazy_tsr._solve(rhs, preconditioner, num_tridiag=num_random_probes)
  File "C:\Users\Mølle\AppData\Local\Programs\Python\Python38\lib\site-packages\gpytorch\lazy\lazy_tensor.py", line 674, in _solve
    return utils.linear_cg(
  File "C:\Users\Mølle\AppData\Local\Programs\Python\Python38\lib\site-packages\gpytorch\utils\linear_cg.py", line 188, in linear_cg
    raise RuntimeError("NaNs encountered when trying to perform matrix-vector multiplication")
RuntimeError: NaNs encountered when trying to perform matrix-vector multiplication

Version

GPyTorch version: 1.6.0 PyTorch version: 1.11.0+cu113

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
KasperNiklascommented, Apr 25, 2022

@Balandat it now works when we use torch.double!

Thank you both for your replies 👍

0reactions
KasperNiklascommented, Apr 25, 2022

Hi @Balandat , This might be the issue we will look into that - however at first try we run out of memory when using torch.double.

Also thanks for your quick replies! It is awesome that the GPyTorch team is so quick at given responses 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

[Bug] RuntimeError: NaNs encountered matrix-vector ... - GitHub
Bug RuntimeError: NaNs encountered when trying to perform matrix-vector multiplication I am receiving the above error after 12 batches of ...
Read more >
Numpy matrix multiplication returns nan - Stack Overflow
Save this question. Show activity on this post. I have two 2-D matrices, and I want to multiply these two matrices to get...
Read more >
NaNs encountered when trying to perform matrix-vector ...
Hi, I'm using Gpytorch to implement a multi output regression, but i have an error when i try to use a Periodic kernel....
Read more >
Slowing down matrix multiplication in R | Radford Neal's blog
The NaN check is actually slower than a multiply, so the overhead of the check will be negligible only if N+M is several...
Read more >
MATLAB cumprod - Cumulative product - MathWorks
Ignore NaN values in the cumulative product calculation using the 'omitnan' ... Input array, specified as a vector, matrix, or multidimensional array.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found