Filtering out artificial teminal states
See original GitHub issueIn many gym environments, like MountainCarContinuous, there is an epsiode step limit. This leads to episode termination before actually achieving the end of trajectory(which in this case is reaching uphill).
Saving these experiences to buffer without changing artificial terminals to False, for example, in here, leads to an error in computing TD errors. I think the agent’s prediction about the future rewards while it has not reached the real end of the trajectory yet, should be taken into account.
This is why some implementations like OpenAI SpinningUp change that terminal states before saving the experience, like this:
"""From OpanAI SpinningUp source code"""
# Ignore the "done" signal if it comes from hitting the time
# horizon (that is when it's an artificial terminal signal
# that isn't based on the agent's state)
d = False if ep_len==max_ep_len else d
# Store experience to replay buffer
replay_buffer.store(o, a, r, o2, d)
Issue Analytics
- State:
- Created 3 years ago
- Comments:6
Top Results From Across the Web
Dynamic model-based filtering for mobile terminal location ...
The model of mo- bile terminal motion has a kinematic state space model describing the physical rules governing terminal motion and a control...
Read more >Dynamic model-based filtering for mobile terminal location ...
A model-based dynamic filter is presented that uses an accurate model of mobile terminal motion to combine information from location ...
Read more >Dynamic model-based filtering for mobile terminal ... - TSpace
The model of mo- bile terminal motion has a kinematic state space model describing the physical rules governing terminal motion and a control...
Read more >Uncertainty and filtering of hidden Markov models in discrete ...
Filtering is a common problem in many applications. The essential concept is that there is an unseen Markov process, which influences the state...
Read more >Neural Filtering - Scholarpedia
As neural filters are synthesized from realizations of the signal and measurement processes, they are applicable whether or not the mathematical ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Answered here https://github.com/DLR-RM/stable-baselines3/issues/829
I created a branch on SB3 but it in fact a bit more tricky than expected (notably because
VecEnv
resets automatically): https://github.com/DLR-RM/stable-baselines3/compare/feat/remove-timelimitFor A2C/PPO or any n-step methods, we would need to keep track of two types of terminations signal…