[Question] How to implement the AdditiveStructureKernel to make the training and inference faster
See original GitHub issueHere I would like to train a GP model on a very high dimension X, I will first decompose the X into 27 subspace_dim and then uses the addition of 27 MaternKernels as covar_module, however, the speed is even slower than ScaleKernel of origin X without decomposition, what should I do?
** Code snippet to reproduce **
def split(a, n):
# split array a into n approximately equal splits
k, m = divmod(len(a), n)
return [tuple(a[i * k + min(i, m):(i + 1) * k + min(i + 1, m)]) for i in range(n)]
# for simplicity, I use random value here, the origin is normalized x and y.
train_x = torch.randn(30, 6912).to('cuda:0')
train_y = torch.randn(30).to('cuda:0')
subspace_dim_list = split([for i in range(train_x.shape[1])], 27)
lengthscale_constraint = Interval(0.005, 2.0)
outputscale_constraint = Interval(0.05, 20.0)
for i in range(27):# range(self.n_sub): # 27
# get active dimensions for each subspace_id
subspace_dim = subspace_dim_list[i]
kern_i = MaternKernel(lengthscale_constraint=lengthscale_constraint, ard_num_dims=len(subspace_dim), nu=2.5, active_dims=subspace_dim)
if i == 0:
kern = kern_i
else:
kern += kern_i
covar_module = AdditiveStructureKernel(base_kernel=kern, num_dims=train_x.shape[1])
noise_constraint = Interval(1e-6, 2e-6)
likelihood = GaussianLikelihood(noise_constraint=noise_constraint).to(device=train_x.device, dtype=train_y.dtype)
# ard_dims = train_x.shape[1] if use_ard else None
sub_model = GP(
train_x=train_x,
train_y=train_y,
likelihood=likelihood,
covar_module=covar_module
).to(device=train_x.device, dtype=train_x.dtype)
# Find optimal model hyperparameters
sub_model.train()
likelihood.train()
# "Loss" for GPs - the marginal log likelihood
mll = ExactMarginalLogLikelihood(likelihood, sub_model)
optimizer = torch.optim.Adam([{"params": sub_model.parameters()}], lr=0.1)
# try:
# print(train_x.shape)
# print(train_y.shape)
with torch.enable_grad(), gpytorch.settings.max_cholesky_size(2000):
for _ in range(1000):
optimizer.zero_grad()
output = sub_model(train_x)
loss = -mll(output, train_y)
loss.backward()
optimizer.step()
model_log_likelihood = loss
However, as I run the code on the single GPU, it needs 384s to run. If I use
kern = MaternKernel(lengthscale_constraint=lengthscale_constraint, ard_num_dims=len(input_dim_permutate_list), nu=2.5)
covar_module = ScaleKernel(kern, outputscale_constraint=outputscale_constraint)
it only need 58s, but my original thought is to accelerate it to be faster than 58s, like 10s.
Have I implemented the AdditiveStructureKernel right? Or what should I do? Thank you very much.
By the way, I followed what GPy do, I will also paste their code here
def _create_model_sub(self, X, Y, input_dim_permutate_list):
"""
Creates the model for a subspace of dimensions
:param X: observed input data
:param Y: observed output data
:param input_dim_permutate_list: shuffled input dimension list
"""
# split the input dimensions into nsubspaces
subspace_dim_list = split(input_dim_permutate_list, self.n_sub)
# define the additive kernel
for i in range(self.n_sub): # 27
# get active dimensions for each subspace_id
subspace_dim = subspace_dim_list[i]
kern_i = GPy.kern.Matern52(len(subspace_dim), variance=1., ARD=self.ARD,
active_dims=subspace_dim, name=f'k{i}')
# kern_i.variance.fix()
if i == 0:
kern = kern_i
else:
kern += kern_i
print(kern)
# define GP model
noise_var = Y.var() * 0.01 if self.noise_var is None else self.noise_var
# if not self.sparse_surrogate:
sub_model = GPy.models.GPRegression(X, Y, kernel=kern, noise_var=noise_var)
# else:
# sub_model = GPy.models.SparseGPRegression(X, Y, kernel=kern, num_inducing=self.num_inducing)
# if self.exact_feval:
# restrict noise variance if exact evaluations of the objective
sub_model.Gaussian_noise.constrain_fixed(1e-6, warning=False)
# else:
# # bound the noise variance if not
# sub_model.Gaussian_noise.constrain_bounded(1e-9, 1e6, warning=False)
# optimise the GP hyperparameters
try:
sub_model.optimize(optimizer=self.optimizer, max_iters=self.max_iters, messages=False,
ipython_notebook=False)
model_log_likelihood = sub_model.log_likelihood()
except:
model_log_likelihood = -100.00
return sub_model, model_log_likelihood
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
gpytorch.kernels
PiecewisePolynomialKernel(q = 2, batch_shape=torch. ... AdditiveStructureKernel computes each of the additive terms in batch, making it very fast.
Read more >5 Practical Ways to Speed Up your Deep Learning Model
We don't need a fully trained model to evaluate inference time. This allows us to do experimentation with multiple backbones, and see which ......
Read more >[P] Up to 12X faster GPU inference on Bert, T5 and ... - Reddit
We are releasing Kernl under Apache 2 license, a library to make PyTorch models inference significantly faster.
Read more >GPyTorch Documentation - PDF Free Download - DocPlayer.net
A GP Model (gpytorch.models.exactgp) - This handles most of the inference. 2. ... In GPyTorch, we make use of the standard PyTorch optimizers...
Read more >Getting started with Mask R-CNN in Keras - Gilbert Tanner
Create model object in inference mode. model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config) # Load weights trained on MS-COCO ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
What exactly are you comparing here? Can you provide code examples?
Thanks for your reply @gpleiss . Yes, I have also found that additive decompositions are slower than non-additive decompositions. However, I have an extra question, since we all suppose the GPU runs faster than CPU, however, when I compare the speed with Gpytorch and GPy, I found that in the high dimensional space the Gpytorch did not run faster than GPy, have you compared the speed with other responsories? How can I improve my speed in Gpytorch? The speed comparison is on the environment of 1 GPU and 28 CPUs. I have already evaluated 30 points and want to inference the next point by Matern52.