Move lengthscale outside of kernel
See original GitHub issueProposal: kernels are not responsible for lengthscales. We can introduce a separate scaling module that divides the data by lengthscales before feeding the data into the kernel
Reasoning: we do some batch-dimension hacking to get fast kernel diagonals, as well as fast batch kernels. For kernel diags - we transform the n x d
data into n x 1 x d
data, which then only computes kernel diagonals. For additive/multiplicative kernels, we transform the n x d
data into d x n x 1
data.
There is a problem when we are using an ARD option for kernels, or when we have separate lengthscales for the different batches. If this lengthscale scaling happens before the data enters the kernel, this problem is mitigated.
In general - this would introduce a convention that kernels should not define their own parameters (which is already the case with output scales).
Rolling in the change: if we’re all on board with this, we will deprecate kernel lengthscales. We will encourage users to use the lengthscale
module and initialize kernels with lengthscale=False
. When we’re ready for a major release (and remove lengthscales completely from kernels), then the lengthscale=False
kwarg won’t be necessary any more.
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
So one use-case is to have different length-scales for different tasks in a multi-task GP. Do you think it would be easy to implement this under the new proposal?
cc @rajkumarkarthik, @darbour
Why would an LCM kernel be less computationally burdensome than having task-wise lengthscales? In terms of Kernel evaluation the task-specific lengthscales should not make any difference, only hyperparameter space would be higher dimensional in the fitting.
Regardless, having an LCM kernel would be useful, if anything as a baseline to compare against. #261