Support dynamic action_space A(s)
See original GitHub issueI was wondering does this library support dynamically updating action_space during agent training? I need to put constraints on my model that simply disallows specific actions given a current state.
Right now I have code inside the step function does something like the below.
def step(a):
next_state = model(current_state, a)
self.action_space = action_dist(next_state)
I would expect that the agent would pick up the new action space and sample it on the next iteration. But it seems like baselines grabs ahold of action_space during init and stores it. Can you point me to the place in the code where baselines samples from the action_space we create inside init? I wonder if I can make some changes to the code to allow it to dynamically update action_space based on state.
Issue Analytics
- State:
- Created 2 years ago
- Comments:10 (2 by maintainers)
Top Results From Across the Web
How to define an action space when an agent can take ...
I'm attempting to design an action space in OpenAI's gym and ... Are there any other spaces I could use to dynamically scale...
Read more >Dynamic action space · Issue #751 · openai/gym - GitHub
Since my environment has 16 actions with 25 subactions and at times only a few of them are legal. I went through the...
Read more >Dynamic action space in RL : r/reinforcementlearning - Reddit
I am doing a project and there is a problem with dynamic action space. A complete action space can be divided into four...
Read more >Action Space Shaping in Deep Reinforcement Learning - arXiv
In reinforcement learning we consider agents that interact with environments at discrete time-steps t by taking actions at from a set of possible...
Read more >Reinforcement Learning Model With Dynamic State Space ...
Here, we developed a reinforcement learning model with a dynamic state ... That is, the current state si is set based on the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
SB3 uses the sample method for off-policy algorithm during “warmup phase”, not for A2C/PPO or other on-policy algorithms. I doubt this is different in other codebase.
Ok, thanks for answering my questions. I’m pretty new to designing a real world RL algo and learning a lot. I have a solution using the SB3 contrib package.