Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] How to implement the AdditiveStructureKernel to make the training and inference faster

See original GitHub issue

Here I would like to train a GP model on a very high dimension X, I will first decompose the X into 27 subspace_dim and then uses the addition of 27 MaternKernels as covar_module, however, the speed is even slower than ScaleKernel of origin X without decomposition, what should I do?

** Code snippet to reproduce **


def split(a, n):  
    # split array a into n approximately equal splits
    k, m = divmod(len(a), n)
    return [tuple(a[i * k + min(i, m):(i + 1) * k + min(i + 1, m)]) for i in range(n)]
# for simplicity, I use random value here, the origin is normalized x and y.
train_x = torch.randn(30, 6912).to('cuda:0')
train_y = torch.randn(30).to('cuda:0')

subspace_dim_list = split([for i in range(train_x.shape[1])], 27)

lengthscale_constraint = Interval(0.005, 2.0)
outputscale_constraint = Interval(0.05, 20.0)
for i in range(27):# range(self.n_sub):  # 27
    # get active dimensions for each subspace_id
    subspace_dim = subspace_dim_list[i]

    kern_i = MaternKernel(lengthscale_constraint=lengthscale_constraint, ard_num_dims=len(subspace_dim), nu=2.5, active_dims=subspace_dim)

    if i == 0:
        kern = kern_i
    else:
        kern += kern_i
covar_module = AdditiveStructureKernel(base_kernel=kern, num_dims=train_x.shape[1])

noise_constraint = Interval(1e-6, 2e-6)
likelihood = GaussianLikelihood(noise_constraint=noise_constraint).to(device=train_x.device, dtype=train_y.dtype)
# ard_dims = train_x.shape[1] if use_ard else None
sub_model = GP(
    train_x=train_x,
    train_y=train_y,
    likelihood=likelihood,
    covar_module=covar_module
    ).to(device=train_x.device, dtype=train_x.dtype)

# Find optimal model hyperparameters
sub_model.train()
likelihood.train()

# "Loss" for GPs - the marginal log likelihood
mll = ExactMarginalLogLikelihood(likelihood, sub_model)

optimizer = torch.optim.Adam([{"params": sub_model.parameters()}], lr=0.1)
# try:
# print(train_x.shape)
# print(train_y.shape)
with torch.enable_grad(), gpytorch.settings.max_cholesky_size(2000):
    for _ in range(1000):
        optimizer.zero_grad()
        output = sub_model(train_x)
        loss = -mll(output, train_y)
        loss.backward()
        optimizer.step()
model_log_likelihood = loss

However, as I run the code on the single GPU, it needs 384s to run. If I use

kern = MaternKernel(lengthscale_constraint=lengthscale_constraint, ard_num_dims=len(input_dim_permutate_list), nu=2.5)
covar_module = ScaleKernel(kern, outputscale_constraint=outputscale_constraint)

it only need 58s, but my original thought is to accelerate it to be faster than 58s, like 10s.

Have I implemented the AdditiveStructureKernel right? Or what should I do? Thank you very much.

By the way, I followed what GPy do, I will also paste their code here

    def _create_model_sub(self, X, Y, input_dim_permutate_list):
        """
        Creates the model for a subspace of dimensions

        :param X: observed input data
        :param Y: observed output data
        :param input_dim_permutate_list: shuffled input dimension list
        """
        # split the input dimensions into nsubspaces
        subspace_dim_list = split(input_dim_permutate_list, self.n_sub)
        # define the additive kernel
        for i in range(self.n_sub): # 27
            # get active dimensions for each subspace_id
            subspace_dim = subspace_dim_list[i]
            kern_i = GPy.kern.Matern52(len(subspace_dim), variance=1., ARD=self.ARD,
                                       active_dims=subspace_dim, name=f'k{i}')
            # kern_i.variance.fix()
            if i == 0:
                kern = kern_i
            else:
                kern += kern_i
        print(kern)
        # define GP model
        noise_var = Y.var() * 0.01 if self.noise_var is None else self.noise_var
        # if not self.sparse_surrogate:
        sub_model = GPy.models.GPRegression(X, Y, kernel=kern, noise_var=noise_var)
        # else:
        #     sub_model = GPy.models.SparseGPRegression(X, Y, kernel=kern, num_inducing=self.num_inducing)

        # if self.exact_feval:
            # restrict noise variance if exact evaluations of the objective
        sub_model.Gaussian_noise.constrain_fixed(1e-6, warning=False)
        # else:
        #     # bound the noise variance if not
        #     sub_model.Gaussian_noise.constrain_bounded(1e-9, 1e6, warning=False)

        # optimise the GP hyperparameters
        try:
            sub_model.optimize(optimizer=self.optimizer, max_iters=self.max_iters, messages=False,
                               ipython_notebook=False)
            model_log_likelihood = sub_model.log_likelihood()
        except:
            model_log_likelihood = -100.00

        return sub_model, model_log_likelihood

Issue Analytics

State:
Created 3 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

gpleisscommented, May 25, 2020

What exactly are you comparing here? Can you provide code examples?

1reaction

HuangYaoweicommented, May 21, 2020

Thanks for your reply @gpleiss . Yes, I have also found that additive decompositions are slower than non-additive decompositions. However, I have an extra question, since we all suppose the GPU runs faster than CPU, however, when I compare the speed with Gpytorch and GPy, I found that in the high dimensional space the Gpytorch did not run faster than GPy, have you compared the speed with other responsories? How can I improve my speed in Gpytorch? The speed comparison is on the environment of 1 GPU and 28 CPUs. I have already evaluated 30 points and want to inference the next point by Matern52.

Top Results From Across the Web

gpytorch.kernels

PiecewisePolynomialKernel(q = 2, batch_shape=torch. ... AdditiveStructureKernel computes each of the additive terms in batch, making it very fast.

5 Practical Ways to Speed Up your Deep Learning Model

We don't need a fully trained model to evaluate inference time. This allows us to do experimentation with multiple backbones, and see which ......

[P] Up to 12X faster GPU inference on Bert, T5 and ... - Reddit

We are releasing Kernl under Apache 2 license, a library to make PyTorch models inference significantly faster.

GPyTorch Documentation - PDF Free Download - DocPlayer.net

A GP Model (gpytorch.models.exactgp) - This handles most of the inference. 2. ... In GPyTorch, we make use of the standard PyTorch optimizers...

Getting started with Mask R-CNN in Keras - Gilbert Tanner

Create model object in inference mode. model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config) # Load weights trained on MS-COCO ...