Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Training time comparison between GPytorch and GPflow.

See original GitHub issue

Hi all!

After reading the paper “GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration” i was interested to migrate from GPflow to GPytorch for obvious reasons of training speed. Recently i began to do some playing with GPytorch. But the surprising result is that GPflow is faster in training than GPytorch in my experiments. In an input space of dimension 25 and with 375 training points and with a Tesla P100 as GPU the GPflow training time using 500 Adam iterations takes approximatively 3 sec while the GPytorch training time using also 500 Adam iterations takes 11 sec. When assessing the GPU usage via watch -n 2 nvidia-smi the tensorflow code takes all the memory usage (16280MiB) while the pytorch code takes only (800 MiB) which is expected since pytorch allocate GPU memory dynamically. However, in the training while GPflow uses 70% of the GPU (Volatile GPU-Util), GPytorch does not exceed 22%.

The following code is used for GPytorch

class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)
x_cuda=x_train.cuda()
y_cuda=y_train.cuda()
likelihood = gpytorch.likelihoods.GaussianLikelihood()  
model = ExactGPModel(x_cuda, y_cuda, likelihood1).cuda() 
model.train()
likelihood.train()
optimizerm = torch.optim.Adam([{'params': model.mean_module.parameters()},{'params': model.covar_module.parameters()},{'params': model.likelihood.parameters()}], lr=0.1)
def train(training_iter = 500):
    for k in range(training_iter):   
        optimizerm.zero_grad()
        outputm = model(x_cuda)
        loss = -mllm1(outputm1, y_cuda)
        loss.backward()
        #print('Iter %d/%d - Loss: %.3f' % (i + 1, training_iter, loss.item()))
        optimizerm.step()

with gpytorch.settings.use_toeplitz(False):
    train()

And the following code is used for GPflow:

likelihood = gpflow.likelihoods.Gaussian()
kernel = gpflow.kernels.RBF(dim)
model = gpflow.models.GPR(x_train,y_train,kern=kernel)
adam_action = AdamOptimizer(learning_rate=0.1).make_optimize_action(model)
actions=[adam_action]
Loop(actions, stop=5000)()

Thank you in advance.

Issue Analytics

State:
Created 4 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

2reactions

jacobrgardnercommented, Jun 17, 2019

@eytan good point. That threshold was chosen pretty arbitrarily, and it may move around. Let me do a benchmark in a little bit and see if it should be e.g. 512.

I think we should also maybe have a goal for the package where optimizations we do on large scale settings don’t push it up too far. Maybe a good design goal would be to keep the threshold under 1000 for when CG starts becoming clearly faster?

1reaction

eytancommented, Jun 17, 2019

@Hebbalali , I’d be curious how the performance compares if you use 255 training points instead of 300. AFAIK the choice in when we switch from Cholesky to CG (at 256 points) is somewhat arbitrary, so it’s possible that we may wish to increase that threshold as the default.

Top Results From Across the Web

GPyTorch

A highly efficient and modular implementation of GPs, with GPU acceleration.Implemented in PyTorch. Scalablity. Train Gaussian processes with millions of ...

Gaussian Processes | Papers With Code

Gaussian Processes is a powerful framework for several machine learning tasks such as regression, classification and inference. Given a finite set of input ......

Modern Gaussian Process Regression - Towards Data Science

Gaussian Process Regression coupled with modern computing enables for near-real-time, scalable, and sample-efficient prediction.

Deep Gaussian Processes - GitHub Pages

This group used expectation propagation to train the GP. ... think there is any best one because I'm almost certain noone has done...

Meta-Learning Mean Functions for Gaussian Processes - arXiv

When it comes to meta-learning in Gaussian process models, approaches in ... using the GPflow package [10] and the GPyTorch [16] package.