Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[question] Custom environment recommendations

See original GitHub issue

Hi! I am trying to create some RL-based agents in my custom Unity ML-agents environment. I implemented all required functions in the Env but I have several questions:

should the environment reset itself every particular number of timesteps? In my case it is important to learn behavior from different perspectives (which are generated every each reset), yet I did not find any env.reset() calls in learn function - perhaps I should call it myself, repeat learn() call or do something else?
what happens when the agent reaches its target? (when the environment is “done”?) I noticed some freezing when one of the agents sends done signal, perhaps the environment should take care of such situation and reset this agent environment? Or should it ignore this fact and wait for all environments to reset? Is it somehow taken care of in the baselines lib?

For now I am using A2C algorithm with several concurrent environments. Of course, I can provide all neccessary additional information on my setup.

Issue Analytics

State:
Created 5 years ago
Comments:6

Top GitHub Comments

1reaction

ernestumcommented, Mar 25, 2019

should the environment reset itself every particular number of timesteps?

I recommend just returning done = True after some timeout even if the target is not reached. Be sure to omit any terminal rewards in that case. Do not call reset() manually.

1reaction

araffincommented, Mar 23, 2019

Hello,

yet I did not find any env.reset() calls in learn function

It depends on which algorithm you are using. For instance, PPO2/A2C use VecEnv that reset automatically (as stated in the doc). For other algorithm, like SAC, the reset is explicit.

should take care of such situation and reset this agent environment?

I assume you are talking about VecEnv, then the answer is in the previous paragraph 😉

Btw, if you are using A2C with continuous actions, there is a bug in the current implementation that is fixed in #206 (will be merged soon, but the fix is only one line of code), I would recommend you to either use PPO2 (until it is merged) or fix the code (see commit https://github.com/hill-a/stable-baselines/pull/206/commits/689afd16f5b07d2fead1fa5e8474a8efa2826a64).