[question] Training sequential tasks on multiple instances of an environment
See original GitHub issueI want to train an agent to reach point B from point A and then reach point C from point B. The idea is to train two separate agents, which one of them learn A -> B
move and another one learns B -> C
move. There can be multiple ways to this:
- Create two environment instances and in the second one, initialize the second agent’s position randomly around point B
- Create two environment instances, and initialize the second agent’s position in the second environment to the last point the first agent visited in the first environment. For this, we need to train agent 1 for one episode, and then train agent 2 for one episode and repeat this loop again and again.
Is it possible to implement the second idea with stable_baselines?
Issue Analytics
- State:
- Created 3 years ago
- Comments:5
Top Results From Across the Web
A Survey on Multi-Task Learning - arXiv
Instance - based MTL identifies useful data instances in a task for other tasks and then shares knowledge via the identified instances.
Read more >An Overview of Multi-Task Learning in Deep Neural Networks
This blog post gives an overview of multi-task learning in deep neural networks. It discusses existing approaches as well as recent ...
Read more >c# - Running multiple async tasks and waiting for them all to ...
Awaiting each task sequentially, as your answer suggests, is rarely a good idea. If you decide that leaking fire-and-forget tasks is OK for...
Read more >Multi-Channel Interactive Reinforcement Learning ... - Frontiers
Reinforcement learning is a powerful tool for this as it allows for a robot to learn and improve on how to combine skills...
Read more >Learning to Prompt for Continual Learning - Google AI Blog
However, in continual learning, these two tasks arrive sequentially, and the model only has access to the training data of the current task....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Right, yes, I did that once and it took a long time to learn anything…
When I tried to do something similar for hierarchical algorithms, I used generators.
@mhtb32 If you are comfortable with the source code, you can use generators. You have to change the learn function to call
yield
when you reach the target. Then it boils down to:But you will have to slightly modify the source.
Hmm it might be possible, but we do not recommend calling
train
repeatedly in row (might do initializations and stuff all over again, possibly leaking memory and also erasing any optimizer statistics).