[Question] Suport to unfeasible regions of the domain
See original GitHub issueImportant Note: We do not do technical support, nor consulting and don’t answer personal questions per email. Please post your question on the RL Discord, Reddit or Stack Overflow in that case.
Question
My questions concerns the environments where not every action is feasible to be executed in control time by the agent, i.e my space action normalized is [-1, 1], however it’s possible that only [-0.4, 0.4] is feasible in a given time, or only [-0.1, 0.2] is feasible (not necessarily symmetric). When I try to train my env using the stable baselines3 the actions received by my environment are way out of scope of [-1, 1], and I think this problem is caused by my feasible domain not being fixed or symmetrical.
How should I deal with this issue? Because this limitation is linked to a physical limitation which I have no control.
Additional context
Example of feasible action(actions already normalized to [-1, 1]) for given timesteps:
1
In [10]: env.action_space.low
Out[10]:
array([-1. , -0.4101973 , -0.5177778 , -0.90388674, -1. ],
dtype=float32)
In [11]: env.action_space.high
Out[11]:
array([1. , 1. , 1. , 0.8961133 , 0.45055556],
dtype=float32)
2
In [18]: env.action_space.low
Out[18]:
array([-1. , -0.35272583, -0.90092593, -0.09008063, -1. ],
dtype=float32)
In [19]: env.action_space.high
Out[19]:
array([0.52294207, 1. , 1. , 1. , 0.37092593],
dtype=float32)
Checklist
- I have read the documentation (required)
- I have checked that there is no similar issue in the repo (required)
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (3 by maintainers)
Top GitHub Comments
hmm, if your environment is still Markovian, it should not be an issue. For instance, if you use SAC, this algorithm uses a Q-value function, which is
Q(state, action)
so that the value depends on the state and action.if you mean by that having fixed “exposed” action space with limits [-1, 1] and then changing the limits inside the env, this should work.
@araffin We conducted some experiments with your suggestion and the results were very good, thanks for you help!