Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Coregionalized GP predict documentation (multiple inputs)

See original GitHub issue

I am a little bit confused how to invoke the predict method of the GPCoregionalizedRegression Model.

This notebook refers to an extended input format, where the input array of dimensions (num_observations) is extended with an additional column of ones, so we now have an input of shape (num_observations, 2). What does this second column refer to? The task index? Most importantly, how does this generalize to having multiple inputs? (i.e., the tasks have multiple columns in the X array).

As the Coregionalized GP inherits the predict method from the GP core module, the documentation is unfortunately not up to date (It says The points at which to make a prediction :type Xnew: np.ndarray (Nnew x self.input_dim)).

Unfortunately, the examples module doesn’t use multiple inputs nor the predict method so this didn’t help either.

Any help would be greatly appreciated, I’d be happy to create a PR with an additional for the examples module once this is resolved.

Issue Analytics

State:
Created 5 years ago
Comments:6 (2 by maintainers)

Top GitHub Comments

3reactions

ric70x7commented, May 1, 2018

Hello,

Here is a script that runs both cases: Regression and CoregionalizedRegression

import numpy as np import GPy

X = np.array([[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 1, 1], [1, 0, 0], [1, 0, 1], [1, 1, 0], [1, 1, 1]]) Y = np.array([[0, 0, 1, 0, 1, 0, 1, 1]]).T

print(X.shape) # (8, 3) print(Y.shape) # (8, 1) num_tasks = 2 num_obs, num_feats = X.shape num_feats -= 1 # important, because the last column indicates the “task”

Using GPRegression

kern = GPy.kern.RBF(2) ** GPy.kern.Coregionalize(input_dim=1, output_dim=num_tasks, rank=1) # The RBF is the one that has 2 inputs, Coregionalized works on the output-index dimension m0 = GPy.models.GPRegression(X, Y, kern) m0.optimize() m0.predict(X)

Using ICM (there is also an LCM option if you need to pass a list of base

kernels) lcm1 = GPy.util.multioutput.ICM(input_dim=2,num_outputs=2, kernel=kern) m1 = GPy.models.GPRegression(X, Y, lcm1) m1.optimize() m1.predict(X)

Using GPCoregionalizedRegression

lcm2 = GPy.util.multioutput.ICM(input_dim=2,num_outputs=2, kernel=GPy.kern.RBF(2))

The inputs are lists of input-output, without the artificial feature

added as the third column X_list = [X[X[:,2]==i, :2] for i in range(2)] Y_list = [Y[X[:,2]==i] for i in range(2)]

m2 = GPy.models.GPCoregionalizedRegression(X_list, Y_list, kernel=lcm2) m2.optimize()

new_X = X # In this case the input is not a list, but an array and it includes the third column with the index m.predict(new_X,Y_metadata={‘output_index’:np.ones((new_X.shape[0],1)).astype(int)})

Predicitions need us to tell which noise model we are using (again an

output index per point) m.predict(new_X,Y_metadata={‘output_index’:np.zeros((new_X.shape[0],1)).astype(int)})

the output_index doesn’t need to be all zeros or ones, it can be a

combination. But it has to be a combination of zeros or ones (ie the outputs used to train the model).

Some notes:

1) The difference between using GPRegression with with an ICM/LCM kernel

vs GPCoregionalized Regression:

The first one assumes the noise variance is the same for both outputs,

the latter assumes noise is different for each one.

Have a look at print(m1) vs print(m2).

This is important becasue at some point we were aiming to have different

noise models for the different outputs,

for example, one output could be Gaussian and the other could be

Bernoulli (I like a lot that idea!).

2) It is confusing that GPCoregionalized uses lists with two-columns

arrays, but the prediction uses arrays with 3-columns.

Yes, that is an issue that probably needs to change.

3) It is also confusing that we need to pass a metadata with the

output_index if we are already

stacking the third column to X. Yes, but again this framework was being

developed to allow the combination of

different noise models. The noise class is handled by the Likelihood

object, while the output correlation is handled by

the kernel, but they are independent from each other. So in the end this

redundancy was needed.

I hope this helps.

Ricardo

On Tue, May 1, 2018 at 2:48 PM, janvanrijn notifications@github.com wrote:

I took the liberty of slightly adapting your code to a MWE that suits my needs:

import numpy as np import GPy

X = np.array([[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 1, 1], [1, 0, 0], [1, 0, 1], [1, 1, 0], [1, 1, 1]]) Y = np.array([[0, 0, 1, 0, 1, 0, 1, 1]]).T

print(X.shape) # (8, 3) print(Y.shape) # (8, 1) num_tasks = 2 num_obs, num_feats = X.shape num_feats -= 1 # important, because the last column indicates the “task”

kern = GPy.kern.RBF(1) ** GPy.kern.Coregionalize(input_dim=num_feats, output_dim=num_tasks, rank=1) m = GPy.models.GPRegression(X, Y, kern)

print(‘Start optimizing’) m.optimize() print(‘Done optimizing’)

This all seems to work perfectly fine. However, when I want to add predictions on the train set, it crashes with the following error:

m.predict(X)

Traceback (most recent call last): File “/home/janvanrijn/projects/openml-multitask/examples/test.py”, line 20, in <module> m.predict(X) File “/home/janvanrijn/anaconda3/envs/openml-multitask/lib/python3.6/site-packages/GPy/core/gp.py”, line 323, in predict mean, var = self._raw_predict(Xnew, full_cov=full_cov, kern=kern) File “/home/janvanrijn/anaconda3/envs/openml-multitask/lib/python3.6/site-packages/GPy/core/gp.py”, line 280, in _raw_predict mu, var = self.posterior._raw_predict(kern=self.kern if kern is None else kern, Xnew=Xnew, pred_var=self._predictive_variable, full_cov=full_cov) File “/home/janvanrijn/anaconda3/envs/openml-multitask/lib/python3.6/site-packages/GPy/inference/latent_function_inference/posterior.py”, line 286, in _raw_predict Kxx = kern.Kdiag(Xnew) File “/home/janvanrijn/anaconda3/envs/openml-multitask/lib/python3.6/site-packages/GPy/kern/src/kernel_slice_operations.py”, line 94, in wrap ret = f(self, s.X, *a, **kw) File “<decorator-gen-20>”, line 2, in Kdiag File “/home/janvanrijn/anaconda3/envs/openml-multitask/lib/python3.6/site-packages/paramz/caching.py”, line 283, in g return cacher(*args, **kw) File “/home/janvanrijn/anaconda3/envs/openml-multitask/lib/python3.6/site-packages/paramz/caching.py”, line 172, in call return self.operation(*args, **kw) File “/home/janvanrijn/anaconda3/envs/openml-multitask/lib/python3.6/site-packages/GPy/kern/src/prod.py”, line 63, in Kdiag return reduce(np.multiply, (p.Kdiag(X) for p in which_parts)) ValueError: operands could not be broadcast together with shapes (8,) (16,)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/SheffieldML/GPy/issues/633#issuecomment-385801103, or mute the thread https://github.com/notifications/unsubscribe-auth/AC4PUivzbmS7P_e1jwUG8fGsQlzC8B5wks5tuNgqgaJpZM4TlxuU .

0reactions

janvanrijncommented, May 2, 2018

Hi Ricardo,

The RBF is the one that has 2 inputs, Coregionalized works on the output-index dimension

Ouch, rookie mistake. Thanks for pointing this out!

The difference between using GPRegression with with an ICM/LCM kernel vs GPCoregionalized Regression: The first one assumes the noise variance is the same for both outputs, the latter assumes noise is different for each one. Have a look at print(m1) vs print(m2).

Duly noted.

It is confusing that GPCoregionalized uses lists with two-columns arrays, but the prediction uses arrays with 3-columns. Yes, that is an issue that probably needs to change.

(Not my exact wording) Initially I couldn’t see how the 1D case generalized to a N-D case. Now I do it completely makes sense.

Thanks again for posting this example. From my POV this issue can be closed.

Top Results From Across the Web

Using GPy Multiple-output coregionalized prediction

I have been facing a problem recently where I believe that a multiple-output GP might be a good candidate.

A simple demonstration of coregionalisation

A simple demonstration of coregionalisation¶. This notebook shows how to construct a multi-output GP model using GPflow. We will consider a regression ...

A simple demonstration of coregionalization — GPflow 2.4.0 ...

This notebook shows how to construct a multi-output GP model using GPflow. We will consider a regression problem for functions f : R...

Gaussian Processes — Pyro documentation

There are two ways to train a Gaussian Process model: ... Xu = torch.tensor([[1., 0, 2]]) # inducing input >>> likelihood = gp.likelihoods....

Variational GPs w/ Multiple Outputs - GPyTorch's documentation

Types of Variational Multitask Models¶. The most general purpose multitask model is the Linear Model of Coregionalization (LMC), which assumes that each output ......