question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Model training slow in beta version

See original GitHub issue

Training a GP model in the very recent beta version takes significantly more time than that in the older alpha version. I tested this on a simple 1D regression task (taken from notebook) and found that the average training time in beta version is 2.32 seconds compared to 1.21 seconds in alpha version for the exact same scipt (pasted below) except the random_variables replaced by distributions. The script is run on an Intel I7 CPU @2.7 GHz (no cuda). Is this expected behavior ?

import math 
import time
import torch
import numpy as np
import gpytorch 

from gpytorch.likelihoods import GaussianLikelihood
from gpytorch.means import ZeroMean
from gpytorch.kernels import RBFKernel
# import ipdb


class GPModel(gpytorch.models.ExactGP):
	def __init__(self, train_x, train_y, likelihood, version='alpha'):
		super(GPModel, self).__init__(train_x, train_y, likelihood)
		self.mean_module = ZeroMean()
		self.covar_module = RBFKernel()
		self.version = version

	def forward(self, x):
		mean = self.mean_module(x)
		covar = self.covar_module(x)
		if self.version == 'beta':
			return gpytorch.distributions.MultivariateNormal(mean, covar)
		else:
			return gpytorch.random_variables.GaussianRandomVariable(mean, covar)

class GP(object):
	def __init__(self, train_x, train_y, version):
		self.likelihood = GaussianLikelihood()
		self.model = GPModel(train_x, train_y, self.likelihood, version)
		self.optimizer = torch.optim.Adam([{'params': self.model.parameters()}], lr=.1)
		self.mll = gpytorch.mlls.ExactMarginalLogLikelihood(self.likelihood, self.model)
		self.train_x = train_x
		self.train_y = train_y

	def fit(self, max_iterations):
		for i in range(max_iterations):
			self.optimizer.zero_grad()
			output = self.model(self.train_x)
			loss = -self.mll(output, self.train_y)
			loss.backward()
			self.optimizer.step()


if __name__ == '__main__':
	train_x = torch.linspace(0, 1, 100)
	train_y = torch.sin(train_x * (2*math.pi)) + torch.randn(train_x.size())*.2

	num_sims = 5
	all_times = []
	for _ in range(num_sims):
		start = time.time()
		gp = GP(train_x, train_y, 'beta')
		gp.fit(max_iterations=200)
		end = time.time()
		all_times.append(end - start)
	mean_time = np.mean(all_times)
	print(mean_time)

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:13 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
sumitskcommented, Oct 5, 2018

If this helps, for 500 datapoints, the training time in Pytorch 1.0 is ~4 seconds compared to 2.8 in Pytorch 0.4. For 1000 datapoints, the difference is also ~1 second. It seems that there is come constant additional overhead somewhere. This is for the training part only, I haven’t checked the model prediction runtime.

0reactions
jacobrgardnercommented, Nov 28, 2018

As of d971342, things are about as fast as we could hope for I’d say! We’ve been doing some testing, and exact GPs on a GPU train on up to 16k data points at less than half a second per iteration!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Training a simple model in Tensorflow GPU slower than CPU
Training a simple model in Tensorflow GPU slower than CPU · 1. How many CPU cores do you have? Try increasing number of...
Read more >
Training with GPU on TF 2.0 is much slower than on TF 1.14 if ...
But when I run it on TF 2.0.0-rc0, the training is much slower than on TF 1.14. ... is not supported for models...
Read more >
Slow model training is very frustrating - Fast.ai forums
It takes forever to train the model described in chapter 10 of the book. The study of one epoch takes more than 50...
Read more >
Deep Learning: How does beta_1 and beta_2 in the Adam ...
Based on my read of Algorithm 1 in the paper, decreasing β1 and β2 of Adam will make the learning slower, so if...
Read more >
Improving Inference Speeds of Transformer Models - Medium
“With great models comes slower inference speeds”. Deep Learning has evolved immensely and it has Transforme(r)d NLP completely in the past ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found