[Bug] Error handling on model predictions
See original GitHub issueš Bug
When running the code to train a Gaussian process, I get the following error. I havenāt encountered an error thrown this way so Iām unable to debug it post-mortem. Iāve attached two input CSV files for the program, one of which throws the error (trained_agent.csv) and the other which doesnāt. Files are linked below. data_files.zip
To reproduce
** Code snippet to reproduce **
import pandas as pd
import numpy as np
import torch
import gpytorch
class MultitaskGPModel(gpytorch.models.ExactGP):
def __init__(self, train_x, train_y, likelihood):
super(MultitaskGPModel, self).__init__(train_x, train_y, likelihood)
num_tasks = train_y.shape[-1]
self.mean_module = gpytorch.means.MultitaskMean(
gpytorch.means.ConstantMean(), num_tasks=num_tasks
)
self.covar_module = gpytorch.kernels.MultitaskKernel(
gpytorch.kernels.RBFKernel(), num_tasks=num_tasks, rank=1
)
def forward(self, x):
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
return gpytorch.distributions.MultitaskMultivariateNormal(
mean_x, covar_x
)
def check_for_tensor_and_convert(x):
"""Checks if the input is a torch.Tensor. If not, convert it. Also, returns as a float to ensure conformity between tensors.
Args:
x (iterable): input data.
Returns:
torch.Tensor: converted data.
"""
if not isinstance(x, torch.Tensor):
x = np.array(x)
x = torch.from_numpy(x)
return x.float()
def convert_array_from_string(array_as_string):
"""Converts a string that appears as a certain kind of list to a numerical array. For example, "[1 2 3]" will be converted to the array [1, 2, 3].
Args:
array_as_string (str): The array as a string.
Returns:
numpy.ndarray: The converted array.
"""
l = array_as_string[1:-1].split(' ')
array = [float(x) for x in l if x != '']
return np.array(array)
def import_episode_data_from_file(file_name):
"""Imports a file that can be used in a regresssion problem.
Args:
file_name (str): The file containing the data.
Returns:
np.array, np.array: Matrices corresponding to the predictors and
responses to be used for regression.
"""
df = pd.read_csv(file_name)
# convert string columns
df['image'] = df['image'].apply(convert_array_from_string)
df['action'] = df['action'].apply(convert_array_from_string)
df['action'] = np.roll(df['action'], -1)
def concat(row):
return np.concatenate( [row['action'], row['image']] )
predictors = df.apply(concat, axis=1)
predictors = np.vstack(predictors)
predictors = predictors[:-1, :]
responses = np.vstack(df['image'][1:])
return predictors, responses
def train_gp(model, likelihood, train_x, train_y, n_iter=2):
"""Trains a Gaussian process with the given training data.
Credit to https://gpytorch.readthedocs.io/en/latest/examples/01_Exact_GPs/Simple_GP_Regression.html#Training-the-model
Args:
model (gpytorch.models.ExactGP): the model to train.
likelihood (gpytorch.mlls): A likelihood that is compatible with model.
for more info.
train_x (torch.Tensor): The training covariates, as a matrix.
train_y (torch.Tensor): The training responses, may be a vector or matrix.
n_iter (int, optional): The number of iterations to train the Gaussian process.
Returns:
gpytorch.models.ExactGP, gpytorch.mlls): The trained model and the
likelihood used.
"""
# Find optimal model hyperparameters
model.train()
likelihood.train()
# Use the adam optimizer
optimizer = torch.optim.Adam([
{'params': model.parameters()}, # Includes GaussianLikelihood parameters
], lr=0.01)
# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)
for i in range(n_iter):
optimizer.zero_grad()
output = model(train_x)
loss = -mll(output, train_y)
loss.backward()
print('Iter %d/%d - Loss: %.3f' % (i + 1, n_iter, loss.item()))
optimizer.step()
return model, likelihood
if __name__ == '__main__':
X, y = import_episode_data_from_file('logs/test_runs/trained_agent.csv')
X = check_for_tensor_and_convert(X)
y = check_for_tensor_and_convert(y)
likelihood = gpytorch.likelihoods.MultitaskGaussianLikelihood(
num_tasks=y.shape[-1]
)
model = MultitaskGPModel(X, y, likelihood)
model, likelihood = train_gp(
model, likelihood, X, y, n_iter=2
)
test_x = check_for_tensor_and_convert(np.ones((10, 34)))
model.training = False
with torch.no_grad(), gpytorch.settings.fast_pred_var():
predictions = likelihood(model(test_x))
** Stack trace/error message **
Traceback (most recent call last):
File "test.py", line 138, in <module>
predictions = likelihood(model(test_x))
File "C:\Users\user\Anaconda3\lib\site-packages\gpytorch\models\exact_gp.py", line 326, in __call__
predictive_mean, predictive_covar = self.prediction_strategy.exact_prediction(full_mean, full_covar)
File "C:\Users\user\Anaconda3\lib\site-packages\gpytorch\models\exact_prediction_strategies.py", line 302, in exact_prediction
self.exact_predictive_mean(test_mean, test_train_covar),
File "C:\Users\user\Anaconda3\lib\site-packages\gpytorch\models\exact_prediction_strategies.py", line 320, in exact_predictive_mean
res = (test_train_covar @ self.mean_cache.unsqueeze(-1)).squeeze(-1)
File "C:\Users\user\Anaconda3\lib\site-packages\gpytorch\utils\memoize.py", line 34, in g
add_to_cache(self, cache_name, method(self, *args, **kwargs))
File "C:\Users\user\Anaconda3\lib\site-packages\gpytorch\models\exact_prediction_strategies.py", line 269, in mean_cache
mean_cache = train_train_covar.inv_matmul(train_labels_offset).squeeze(-1)
File "C:\Users\user\Anaconda3\lib\site-packages\gpytorch\lazy\lazy_tensor.py", line 939, in inv_matmul
return func.apply(self.representation_tree(), False, right_tensor, *self.representation())
File "C:\Users\user\Anaconda3\lib\site-packages\gpytorch\functions\_inv_matmul.py", line 47, in forward
solves = _solve(lazy_tsr, right_tensor)
File "C:\Users\user\Anaconda3\lib\site-packages\gpytorch\functions\_inv_matmul.py", line 15, in _solve
return lazy_tsr._solve(rhs, preconditioner)
File "C:\Users\user\Anaconda3\lib\site-packages\gpytorch\lazy\lazy_tensor.py", line 655, in _solve
preconditioner=preconditioner,
File "C:\Users\user\Anaconda3\lib\site-packages\gpytorch\utils\linear_cg.py", line 271, in linear_cg
curr_conjugate_vec,
RuntimeError: Expected object of type Variable but found type CPUFloatType for argument #0 'self'
The above operation failed in interpreter, with the following stack trace:
at C:\Users\user\Anaconda3\lib\site-packages\gpytorch\utils\linear_cg.py:55:4
eps,
beta,
residual,
precond_residual,
mul_storage,
is_zero,
curr_conjugate_vec,
):
torch.mul(curr_conjugate_vec, mvms, out=mul_storage)
torch.sum(mul_storage, dim=-2, keepdim=True, out=alpha)
~~~~~~~~~ <--- HERE
# Do a safe division here
torch.lt(alpha, eps, out=is_zero)
alpha.masked_fill_(is_zero, 1)
torch.div(residual_inner_prod, alpha, out=alpha)
alpha.masked_fill_(is_zero, 0)
# We'll cancel out any updates by setting alpha=0 for any vector that has already converged
alpha.masked_fill_(has_converged, 0)
Expected
With the given input data files, the program should simply print the training loss for each iteration and complete without errors. For example,
Iter 1/2 - Loss: 12250.880
Iter 2/2 - Loss: 12203.549
System information
Please complete the following information:
- GPyTorch Version 1.0.1
- PyTorch Version 1.3.0
- Windows 10
Additional context
This happens to work on macOS Mojave Version 10.14.6, running GPyTorch version 0.3.6 and PyTorch version 1.3.0, with both files.
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (3 by maintainers)
Top Results From Across the Web
A Novel Machine Learning Approach for Bug Prediction
So, in this paper, we make an attempt to select the minimal number of best performing metrics, thereby keeping the model both simple...
Read more >Deep Dive Into Error Analysis and Model Debugging in ...
Performing error analysis on three levels ā Prediction, Data, and Features. How to look for bugs and fix them in your model training...
Read more >Error Log Processing for Accurate Failure Prediction - USENIX
The failure prediction technique is based on hidden semi-Markov models (HSMM) and has been described in detail in [5]. However, the main focus...
Read more >Building Prediction APIs in Python (Part 2): Basic Error Handling
However, when we restart our server and test a request with bad values or missing parameters we're going to run into another problem:...
Read more >Handling Prediction Model Errors in Planning for Automated ...
To take this uncertainty into account, we formulate the planning problem as a Partially Observable Markov Decision Process (POMDP) and solve it with...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @khameedk ,
Glancing at the error (havenāt had a chance to run your code yet), it looks like your issue may just be that you need to account for the fact that numpyās default dtype is float64 / double precision while torchās is float32 / single precision. Youāll (depending on whether you are using a GPU and have lots of data) either need to convert your GP model objects and the like to double (e.g., via
model.double()
), or convert your data to single precision e.g. bytorch.from_numpy(x).float()
Also looking at the output you give in the loss: those losses are gigantic, and suggest to me that you havenāt normalized your data. In general, the initial hyper settings we use in GPyTorch roughly assume standardized / normalized training features and labels, so youāll likely either need to normalize your data or change the hyper initializations to get good performance.
Just for completeness I ran this:
With the output:
Here are the column-wise means and standard deviations
Weāll try parameter tuning shortly to try to fix the large losses.