Using SKIP - NaNs encountered matrix-vector multiplication
See original GitHub issueWe are trying to use SKIP from this notebook, since we wish to be able to use GP on a large dataset with +5 dimensions.
We have followed the steps in the notebook and have implemented the code below. It works most of the time, however sometimes the line output = self.model(X_train)
results in the covariance matrix only containing NaN-values. This of course means that the training fails, when trying to run loss = -mll(output, y_train)
.
What might be the cause of this?
class GPRegressionModel(gpytorch.models.ExactGP):
def __init__(self, X_train, y_train, likelihood, grid_size=100):
super(GPRegressionModel, self).__init__(X_train, y_train, likelihood)
X_dims = X_train.shape[1]
self.mean_module = gpytorch.means.ConstantMean()
self.base_covar_module = gpytorch.kernels.RBFKernel()
self.covar_module = gpytorch.kernels.ProductStructureKernel(
gpytorch.kernels.ScaleKernel(
gpytorch.kernels.GridInterpolationKernel(self.base_covar_module, grid_size=grid_size, num_dims=1)
), num_dims=X_dims
)
def forward(self, x):
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)
class GPyRModelRBFSKIP(Model):
def __init__(self,
likelihood: gpytorch.likelihoods.Likelihood,
training_iterations: int = 60,
do_standardise: bool = True):
self.name = 'GPyR_RBF_SKIP'
self.model = None
self.likelihood = likelihood
self.training_iterations = training_iterations
Model.__init__(self, do_standardise)
def fit(self, X_train, y_train, standardise: bool = True, verbose: bool = False, batch_size: int = 5_000):
if self.do_standardise:
self.update_standardisation_values(X_train, y_train)
X_train, y_train = self.standardise(X_train, y_train)
if len(X_train.shape) == 1:
X_train = X_train.reshape(-1, 1)
# Convert to tensor
X_train = torch.tensor(X_train, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32)
if torch.cuda.is_available():
X_train = X_train.cuda()
y_train = y_train.cuda()
# self.model prepares the data by standardising ect.
self.model = GPRegressionModel(X_train, y_train, self.likelihood)
if torch.cuda.is_available():
self.model = self.model.cuda()
self.likelihood = self.likelihood.cuda()
# Train the model
# Find optimal model hyperparameters
self.model.train()
self.likelihood.train()
optimizer = torch.optim.Adam(self.model.parameters(), lr=0.1) # Includes GaussianLikelihood parameters
# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(self.likelihood, self.model)
# Training loop/function from SKIP notebook
def train():
iterator = tqdm.tqdm(range(self.training_iterations))
for i in iterator:
# Zero backprop gradients
optimizer.zero_grad()
with gpytorch.settings.use_toeplitz(False), gpytorch.settings.max_root_decomposition_size(30):
# Get output from model
output = self.model(X_train)
loss = -mll(output, y_train)
# Calc loss and backprop derivatives
loss.backward()
optimizer.step()
torch.cuda.empty_cache()
# See dkl_mnist.ipynb for explanation of this flag
with gpytorch.settings.use_toeplitz(True):
train()
The Traceback we get:
Traceback (most recent call last):
File "D:/OneDrive/OneDrive - Danmarks Tekniske Universitet/DTU-DESKTOP-FEQGP1B/6.semester/Bachelorprojekt/ticra/Bachelor_project_2022/Framework/SKIP_model_investigation.py", line 57, in <module>
model.fit(X_train, y_train)
File "D:\OneDrive\OneDrive - Danmarks Tekniske Universitet\DTU-DESKTOP-FEQGP1B\6.semester\Bachelorprojekt\ticra\Bachelor_project_2022\Framework\Models\Gaussian_Process_Regression\with_GPytorch\GPyR_RBF_SKIP.py", line 105, in fit
train()
File "D:\OneDrive\OneDrive - Danmarks Tekniske Universitet\DTU-DESKTOP-FEQGP1B\6.semester\Bachelorprojekt\ticra\Bachelor_project_2022\Framework\Models\Gaussian_Process_Regression\with_GPytorch\GPyR_RBF_SKIP.py", line 93, in train
loss = -mll(output, y_train)
File "C:\Users\Mølle\AppData\Local\Programs\Python\Python38\lib\site-packages\gpytorch\module.py", line 30, in __call__
outputs = self.forward(*inputs, **kwargs)
File "C:\Users\Mølle\AppData\Local\Programs\Python\Python38\lib\site-packages\gpytorch\mlls\exact_marginal_log_likelihood.py", line 62, in forward
res = output.log_prob(target)
File "C:\Users\Mølle\AppData\Local\Programs\Python\Python38\lib\site-packages\gpytorch\distributions\multivariate_normal.py", line 169, in log_prob
inv_quad, logdet = covar.inv_quad_logdet(inv_quad_rhs=diff.unsqueeze(-1), logdet=True)
File "C:\Users\Mølle\AppData\Local\Programs\Python\Python38\lib\site-packages\gpytorch\lazy\lazy_tensor.py", line 1334, in inv_quad_logdet
inv_quad_term, logdet_term = func(
File "C:\Users\Mølle\AppData\Local\Programs\Python\Python38\lib\site-packages\gpytorch\functions\_inv_quad_log_det.py", line 157, in forward
solves, t_mat = lazy_tsr._solve(rhs, preconditioner, num_tridiag=num_random_probes)
File "C:\Users\Mølle\AppData\Local\Programs\Python\Python38\lib\site-packages\gpytorch\lazy\lazy_tensor.py", line 674, in _solve
return utils.linear_cg(
File "C:\Users\Mølle\AppData\Local\Programs\Python\Python38\lib\site-packages\gpytorch\utils\linear_cg.py", line 188, in linear_cg
raise RuntimeError("NaNs encountered when trying to perform matrix-vector multiplication")
RuntimeError: NaNs encountered when trying to perform matrix-vector multiplication
Version
GPyTorch version: 1.6.0 PyTorch version: 1.11.0+cu113
Issue Analytics
- State:
- Created a year ago
- Comments:5
Top Results From Across the Web
[Bug] RuntimeError: NaNs encountered matrix-vector ... - GitHub
Bug RuntimeError: NaNs encountered when trying to perform matrix-vector multiplication I am receiving the above error after 12 batches of ...
Read more >Numpy matrix multiplication returns nan - Stack Overflow
Save this question. Show activity on this post. I have two 2-D matrices, and I want to multiply these two matrices to get...
Read more >NaNs encountered when trying to perform matrix-vector ...
Hi, I'm using Gpytorch to implement a multi output regression, but i have an error when i try to use a Periodic kernel....
Read more >Slowing down matrix multiplication in R | Radford Neal's blog
The NaN check is actually slower than a multiply, so the overhead of the check will be negligible only if N+M is several...
Read more >MATLAB cumprod - Cumulative product - MathWorks
Ignore NaN values in the cumulative product calculation using the 'omitnan' ... Input array, specified as a vector, matrix, or multidimensional array.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@Balandat it now works when we use torch.double!
Thank you both for your replies 👍
Hi @Balandat , This might be the issue we will look into that - however at first try we run out of memory when using torch.double.
Also thanks for your quick replies! It is awesome that the GPyTorch team is so quick at given responses 😃