Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

What GP module to use in reinforcement learning settings

See original GitHub issue

A little background: I am trying to implement a Bayesian RL algorithm for continuous environments. For this, I plan on replacing the last layer of the Critic network with a GP layer with the hope that this makes the algorithm significantly more sample-efficient.

For training, the network samples a fresh set of datapoints for each update. As I cannot get my hands on the training data beforehand, I don’t think Exact inference is possible. For variational inference, what do I provide the num_data parameter as? Should I provide num_data as the size of the freshly sampled batch (as the previous samples are discarded) or the total number of datapoints (sample_size x num_samples)? I am asking this because varying this parameter considerably affects the performance of the network.

Issue Analytics

State:
Created 4 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

jacobrgardnercommented, Mar 29, 2019

You could use the same object as long as you updated num_data appropriately.

I would not personally recommend viewing num_data as a hyperparameter to be tuned. The issue is that it’s effectively controlling the normalization of the ELBO, which has a specific statistical interpretation.

If you modify that normalization, you may get better performance in the sense that you’ll effectively be weighting the “model fit” term more or less depending. It would be hard to justify the change, however – it’s kind of like saying you get better performance by making this probability distribution sum to 3 instead of 1. At the end of the day, it’s your model and your choice though 😄!

1reaction

Akella17commented, Mar 29, 2019

Yes exactly. And when you say recreate the VariationalELBO, should I be creating a new object each time and copying the learned mean and covariances or is it okay to reuse the previous object. In other words, would a single VariationalELBO object be trained on a newer batch of data on each time work?

When the num_data can be either sample_size or total_size, why not something in between (I understand it would not make sense theoretically)? The way I see it, the num_data parameter affects the performance which makes it a hyperparameter that needs to be tuned accordingly.