question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

What GP module to use in reinforcement learning settings

See original GitHub issue

A little background: I am trying to implement a Bayesian RL algorithm for continuous environments. For this, I plan on replacing the last layer of the Critic network with a GP layer with the hope that this makes the algorithm significantly more sample-efficient.

For training, the network samples a fresh set of datapoints for each update. As I cannot get my hands on the training data beforehand, I don’t think Exact inference is possible. For variational inference, what do I provide the num_data parameter as? Should I provide num_data as the size of the freshly sampled batch (as the previous samples are discarded) or the total number of datapoints (sample_size x num_samples)? I am asking this because varying this parameter considerably affects the performance of the network.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
jacobrgardnercommented, Mar 29, 2019

You could use the same object as long as you updated num_data appropriately.

I would not personally recommend viewing num_data as a hyperparameter to be tuned. The issue is that it’s effectively controlling the normalization of the ELBO, which has a specific statistical interpretation.

If you modify that normalization, you may get better performance in the sense that you’ll effectively be weighting the “model fit” term more or less depending. It would be hard to justify the change, however – it’s kind of like saying you get better performance by making this probability distribution sum to 3 instead of 1. At the end of the day, it’s your model and your choice though 😄!

1reaction
Akella17commented, Mar 29, 2019

Yes exactly. And when you say recreate the VariationalELBO, should I be creating a new object each time and copying the learned mean and covariances or is it okay to reuse the previous object. In other words, would a single VariationalELBO object be trained on a newer batch of data on each time work?

When the num_data can be either sample_size or total_size, why not something in between (I understand it would not make sense theoretically)? The way I see it, the num_data parameter affects the performance which makes it a hyperparameter that needs to be tuned accordingly.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Reinforcement Learning in 3 Hours | Full Course using Python
Want to get started with Reinforcement Learning ?This is the course for you!This course will take you through all of the fundamentals ...
Read more >
Three Things to Know About Reinforcement Learning
The training algorithm is responsible for tuning the agent's policy based on the collected sensor readings, actions, and rewards. After training ...
Read more >
Automated Reinforcement Learning: An Overview - arXiv
Automated Reinforcement Learning (AutoRL) provides a framework to au- tomatically make appropriate decisions about the settings of an RL ...
Read more >
A parallel multi-module deep reinforcement learning algorithm ...
Conclusion and future work​​ Considering the characteristics of these data, we propose a novel DRL algorithm, called Parallel Multi-Module ...
Read more >
Effective deep Q-networks (EDQN) strategy for resource ...
Particle Swarm Optimization (PSO) is used to optimize reinforcement learning. PSO has a significant significance when compared to other ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found