[question] Custom environment recommendations
See original GitHub issueHi! I am trying to create some RL-based agents in my custom Unity ML-agents environment. I implemented all required functions in the Env but I have several questions:
- should the environment reset itself every particular number of timesteps? In my case it is important to learn behavior from different perspectives (which are generated every each reset), yet I did not find any env.reset() calls in learn function - perhaps I should call it myself, repeat learn() call or do something else?
- what happens when the agent reaches its target? (when the environment is “done”?) I noticed some freezing when one of the agents sends
done
signal, perhaps the environment should take care of such situation and reset this agent environment? Or should it ignore this fact and wait for all environments to reset? Is it somehow taken care of in the baselines lib?
For now I am using A2C algorithm with several concurrent environments. Of course, I can provide all neccessary additional information on my setup.
Issue Analytics
- State:
- Created 5 years ago
- Comments:6
Top Results From Across the Web
custom environment that mimics gather or align - TeX
The environment I want should leave the equations center aligned if no & is present and should keep the = signs aligned if...
Read more >Custom Environments in OpenAI's Gym | Towards Data Science
Beginner's guide on how to set up, verify, and use a custom environment in reinforcement learning training with Python.
Read more >Reinforcement Learning with Stable Baselines 3 (P.3)
How to incorporate custom environments with stable baselines 3Text-based tutorial and sample code: ...
Read more >Question on custom environment setup [openai-gym] - Reddit
I'm trying to design a custom environment using OpenAI Gym. Due to the lack of courses, etc., I'm reading the documents to have...
Read more >What is a working way to set up a custom Environment in ...
The AzureML documentation recommends this, if a custom system package should be installed. For me this is Graphviz. The conda environment ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I recommend just returning
done = True
after some timeout even if the target is not reached. Be sure to omit any terminal rewards in that case. Do not callreset()
manually.Hello,
It depends on which algorithm you are using. For instance, PPO2/A2C use VecEnv that reset automatically (as stated in the doc). For other algorithm, like SAC, the reset is explicit.
I assume you are talking about VecEnv, then the answer is in the previous paragraph 😉
Btw, if you are using A2C with continuous actions, there is a bug in the current implementation that is fixed in #206 (will be merged soon, but the fix is only one line of code), I would recommend you to either use PPO2 (until it is merged) or fix the code (see commit https://github.com/hill-a/stable-baselines/pull/206/commits/689afd16f5b07d2fead1fa5e8474a8efa2826a64).