FixedNoiseGaussianLikelihood with z-scored data
See original GitHub issueHowdy folks,
I am doing some sensor/data fusion using an exact single task regression GP. The targets come from various sensors, each of which has its own 1-sigma uncertainty. I am grouping the targets according to the sensors Y = [y1, y1, ... y1, y2, ...y2, .... yn, yn, ...., yn]
for sensors 1 through n.
Then I am grouping my 1-sigma noises as Y_std = [d1, d1, ..., d1, d2, d2, ..., d2, ..., dn, dn, ... dn]
, where d1 is the 1-sigma noise of sensor 1, d2 the 1-sigma noise of sensor 2, etc.
Then I am z-scoring my data:
Y'= (Y-mean(Y))/std(Y)
X'= (X-mean(X))/std(X)
Then before passing Y_std**2
to the noise of the FixedNoiseGaussianLikelihood
constructor, I obviously need to correctly adjust for the target z-scoring done on the targets.
The right way of doing this is the main premise of my thread and is what I am searching for.
I started out with z-scoring the sensor standard deviations using the mean and std of the target data
Y_std' = (Y_std-mean(Y))/std(Y)
However, this doesn’t feel right to me, and I get counter intuitive results when I experiment with different settings for the sensor noise values.
I changed this by only dividing the noise by the std of the targets :
Y_std'=Y_std/std(Y)
which appears to work a lot better and both noise and target is undergoing the same scaling.
z-scoring the sensor noise using it’s own mean and std didn’t feel like the right thing to do, as I was afraid I would loose the correct relationship to the target data (essentially I’d be scaling the targets and the sensor noise by different values which doesn’t feel right to me)
I guess I am asking what the right way is for adjusting the sensor noises before passing them off as variances in the FixedNoiseGaussianLikelihood
?
Thanks
Galto
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (1 by maintainers)
Top GitHub Comments
Yes, that’s true. I guess Geoff is wondering to what extent that’s necessary (if the sensors aren’t all that different it may not be).
Yes, it is.
Thanks @Balandat
Gotcha, so yes, in my case, there are very good reasons to “weigh” one data source over another 😃 (or so, at least I think so).