What GP module to use in reinforcement learning settings
See original GitHub issueA little background: I am trying to implement a Bayesian RL algorithm for continuous environments. For this, I plan on replacing the last layer of the Critic network with a GP layer with the hope that this makes the algorithm significantly more sample-efficient.
For training, the network samples a fresh set of datapoints for each update. As I cannot get my hands on the training data beforehand, I don’t think Exact inference is possible. For variational inference, what do I provide the num_data
parameter as? Should I provide num_data
as the size of the freshly sampled batch (as the previous samples are discarded) or the total number of datapoints (sample_size x num_samples)? I am asking this because varying this parameter considerably affects the performance of the network.
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
You could use the same object as long as you updated
num_data
appropriately.I would not personally recommend viewing
num_data
as a hyperparameter to be tuned. The issue is that it’s effectively controlling the normalization of the ELBO, which has a specific statistical interpretation.If you modify that normalization, you may get better performance in the sense that you’ll effectively be weighting the “model fit” term more or less depending. It would be hard to justify the change, however – it’s kind of like saying you get better performance by making this probability distribution sum to 3 instead of 1. At the end of the day, it’s your model and your choice though 😄!
Yes exactly. And when you say recreate the VariationalELBO, should I be creating a new object each time and copying the learned mean and covariances or is it okay to reuse the previous object. In other words, would a single VariationalELBO object be trained on a newer batch of data on each time work?
When the num_data can be either
sample_size
ortotal_size
, why not something in between (I understand it would not make sense theoretically)? The way I see it, thenum_data
parameter affects the performance which makes it a hyperparameter that needs to be tuned accordingly.