Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[question] Training sequential tasks on multiple instances of an environment

See original GitHub issue

I want to train an agent to reach point B from point A and then reach point C from point B. The idea is to train two separate agents, which one of them learn A -> B move and another one learns B -> C move. There can be multiple ways to this:

Create two environment instances and in the second one, initialize the second agent’s position randomly around point B
Create two environment instances, and initialize the second agent’s position in the second environment to the last point the first agent visited in the first environment. For this, we need to train agent 1 for one episode, and then train agent 2 for one episode and repeat this loop again and again.

Is it possible to implement the second idea with stable_baselines?

Issue Analytics

State:
Created 3 years ago
Comments:5

Top GitHub Comments

1reaction

PartiallyTypedcommented, Jun 7, 2020

Hmm it might be possible, but we do not recommend calling train repeatedly in row (might do initializations and stuff all over again, possibly leaking memory and also erasing any optimizer statistics).

Right, yes, I did that once and it took a long time to learn anything…

When I tried to do something similar for hierarchical algorithms, I used generators.

@mhtb32 If you are comfortable with the source code, you can use generators. You have to change the learn function to call yield when you reach the target. Then it boils down to:

agent1 = ...
agent2 = ...
g1 = agent1.learn(...)
g2 = agent2.learn(...)
while your_condition:
     next(g1)
     next(g2)

But you will have to slightly modify the source.

1reaction

Miffylicommented, Jun 7, 2020

You can do it with just the callbacks. You can override the _on_step function to start learning the second agent when it reached checkpoint B.

Hmm it might be possible, but we do not recommend calling train repeatedly in row (might do initializations and stuff all over again, possibly leaking memory and also erasing any optimizer statistics).