Training time comparison between GPytorch and GPflow.
See original GitHub issueHi all!
After reading the paper “GPyTorch: Blackbox Matrix-Matrix Gaussian
Process Inference with GPU Acceleration” i was interested to migrate from GPflow to GPytorch for obvious reasons of training speed.
Recently i began to do some playing with GPytorch. But the surprising result is that GPflow is faster in training than GPytorch in my experiments.
In an input space of dimension 25 and with 375 training points and with a Tesla P100 as GPU the GPflow training time using 500 Adam iterations takes approximatively 3 sec while the GPytorch training time using also 500 Adam iterations takes 11 sec.
When assessing the GPU usage via watch -n 2 nvidia-smi
the tensorflow code takes all the memory usage (16280MiB) while the pytorch code takes only (800 MiB) which is expected since pytorch allocate GPU memory dynamically. However, in the training while GPflow uses 70% of the GPU (Volatile GPU-Util), GPytorch does not exceed 22%.
The following code is used for GPytorch
class ExactGPModel(gpytorch.models.ExactGP):
def __init__(self, train_x, train_y, likelihood):
super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
self.mean_module = gpytorch.means.ConstantMean()
self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
def forward(self, x):
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)
x_cuda=x_train.cuda()
y_cuda=y_train.cuda()
likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = ExactGPModel(x_cuda, y_cuda, likelihood1).cuda()
model.train()
likelihood.train()
optimizerm = torch.optim.Adam([{'params': model.mean_module.parameters()},{'params': model.covar_module.parameters()},{'params': model.likelihood.parameters()}], lr=0.1)
def train(training_iter = 500):
for k in range(training_iter):
optimizerm.zero_grad()
outputm = model(x_cuda)
loss = -mllm1(outputm1, y_cuda)
loss.backward()
#print('Iter %d/%d - Loss: %.3f' % (i + 1, training_iter, loss.item()))
optimizerm.step()
with gpytorch.settings.use_toeplitz(False):
train()
And the following code is used for GPflow:
likelihood = gpflow.likelihoods.Gaussian()
kernel = gpflow.kernels.RBF(dim)
model = gpflow.models.GPR(x_train,y_train,kern=kernel)
adam_action = AdamOptimizer(learning_rate=0.1).make_optimize_action(model)
actions=[adam_action]
Loop(actions, stop=5000)()
Thank you in advance.
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (4 by maintainers)
Top GitHub Comments
@eytan good point. That threshold was chosen pretty arbitrarily, and it may move around. Let me do a benchmark in a little bit and see if it should be e.g. 512.
I think we should also maybe have a goal for the package where optimizations we do on large scale settings don’t push it up too far. Maybe a good design goal would be to keep the threshold under
1000
for when CG starts becoming clearly faster?@Hebbalali , I’d be curious how the performance compares if you use 255 training points instead of 300. AFAIK the choice in when we switch from Cholesky to CG (at 256 points) is somewhat arbitrary, so it’s possible that we may wish to increase that threshold as the default.