Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Model training slow in beta version

See original GitHub issue

Training a GP model in the very recent beta version takes significantly more time than that in the older alpha version. I tested this on a simple 1D regression task (taken from notebook) and found that the average training time in beta version is 2.32 seconds compared to 1.21 seconds in alpha version for the exact same scipt (pasted below) except the random_variables replaced by distributions. The script is run on an Intel I7 CPU @2.7 GHz (no cuda). Is this expected behavior ?

import math 
import time
import torch
import numpy as np
import gpytorch 

from gpytorch.likelihoods import GaussianLikelihood
from gpytorch.means import ZeroMean
from gpytorch.kernels import RBFKernel
# import ipdb


class GPModel(gpytorch.models.ExactGP):
	def __init__(self, train_x, train_y, likelihood, version='alpha'):
		super(GPModel, self).__init__(train_x, train_y, likelihood)
		self.mean_module = ZeroMean()
		self.covar_module = RBFKernel()
		self.version = version

	def forward(self, x):
		mean = self.mean_module(x)
		covar = self.covar_module(x)
		if self.version == 'beta':
			return gpytorch.distributions.MultivariateNormal(mean, covar)
		else:
			return gpytorch.random_variables.GaussianRandomVariable(mean, covar)

class GP(object):
	def __init__(self, train_x, train_y, version):
		self.likelihood = GaussianLikelihood()
		self.model = GPModel(train_x, train_y, self.likelihood, version)
		self.optimizer = torch.optim.Adam([{'params': self.model.parameters()}], lr=.1)
		self.mll = gpytorch.mlls.ExactMarginalLogLikelihood(self.likelihood, self.model)
		self.train_x = train_x
		self.train_y = train_y

	def fit(self, max_iterations):
		for i in range(max_iterations):
			self.optimizer.zero_grad()
			output = self.model(self.train_x)
			loss = -self.mll(output, self.train_y)
			loss.backward()
			self.optimizer.step()


if __name__ == '__main__':
	train_x = torch.linspace(0, 1, 100)
	train_y = torch.sin(train_x * (2*math.pi)) + torch.randn(train_x.size())*.2

	num_sims = 5
	all_times = []
	for _ in range(num_sims):
		start = time.time()
		gp = GP(train_x, train_y, 'beta')
		gp.fit(max_iterations=200)
		end = time.time()
		all_times.append(end - start)
	mean_time = np.mean(all_times)
	print(mean_time)

Issue Analytics

State:
Created 5 years ago
Comments:13 (8 by maintainers)

Top GitHub Comments

2reactions

sumitskcommented, Oct 5, 2018

If this helps, for 500 datapoints, the training time in Pytorch 1.0 is ~4 seconds compared to 2.8 in Pytorch 0.4. For 1000 datapoints, the difference is also ~1 second. It seems that there is come constant additional overhead somewhere. This is for the training part only, I haven’t checked the model prediction runtime.

0reactions

jacobrgardnercommented, Nov 28, 2018

As of d971342, things are about as fast as we could hope for I’d say! We’ve been doing some testing, and exact GPs on a GPU train on up to 16k data points at less than half a second per iteration!