question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[question] Training sequential tasks on multiple instances of an environment

See original GitHub issue

I want to train an agent to reach point B from point A and then reach point C from point B. The idea is to train two separate agents, which one of them learn A -> B move and another one learns B -> C move. There can be multiple ways to this:

  1. Create two environment instances and in the second one, initialize the second agent’s position randomly around point B
  2. Create two environment instances, and initialize the second agent’s position in the second environment to the last point the first agent visited in the first environment. For this, we need to train agent 1 for one episode, and then train agent 2 for one episode and repeat this loop again and again.

Is it possible to implement the second idea with stable_baselines?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
PartiallyTypedcommented, Jun 7, 2020

Hmm it might be possible, but we do not recommend calling train repeatedly in row (might do initializations and stuff all over again, possibly leaking memory and also erasing any optimizer statistics).

Right, yes, I did that once and it took a long time to learn anything…

When I tried to do something similar for hierarchical algorithms, I used generators.

@mhtb32 If you are comfortable with the source code, you can use generators. You have to change the learn function to call yield when you reach the target. Then it boils down to:

agent1 = ...
agent2 = ...
g1 = agent1.learn(...)
g2 = agent2.learn(...)
while your_condition:
     next(g1)
     next(g2)

But you will have to slightly modify the source.

1reaction
Miffylicommented, Jun 7, 2020

You can do it with just the callbacks. You can override the _on_step function to start learning the second agent when it reached checkpoint B.

Hmm it might be possible, but we do not recommend calling train repeatedly in row (might do initializations and stuff all over again, possibly leaking memory and also erasing any optimizer statistics).

Read more comments on GitHub >

github_iconTop Results From Across the Web

A Survey on Multi-Task Learning - arXiv
Instance - based MTL identifies useful data instances in a task for other tasks and then shares knowledge via the identified instances.
Read more >
An Overview of Multi-Task Learning in Deep Neural Networks
This blog post gives an overview of multi-task learning in deep neural networks. It discusses existing approaches as well as recent ...
Read more >
c# - Running multiple async tasks and waiting for them all to ...
Awaiting each task sequentially, as your answer suggests, is rarely a good idea. If you decide that leaking fire-and-forget tasks is OK for...
Read more >
Multi-Channel Interactive Reinforcement Learning ... - Frontiers
Reinforcement learning is a powerful tool for this as it allows for a robot to learn and improve on how to combine skills...
Read more >
Learning to Prompt for Continual Learning - Google AI Blog
However, in continual learning, these two tasks arrive sequentially, and the model only has access to the training data of the current task....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found