question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Filtering out artificial teminal states

See original GitHub issue

In many gym environments, like MountainCarContinuous, there is an epsiode step limit. This leads to episode termination before actually achieving the end of trajectory(which in this case is reaching uphill).

Saving these experiences to buffer without changing artificial terminals to False, for example, in here, leads to an error in computing TD errors. I think the agent’s prediction about the future rewards while it has not reached the real end of the trajectory yet, should be taken into account.

This is why some implementations like OpenAI SpinningUp change that terminal states before saving the experience, like this:

"""From OpanAI SpinningUp source code"""

# Ignore the "done" signal if it comes from hitting the time
# horizon (that is when it's an artificial terminal signal
# that isn't based on the agent's state)
d = False if ep_len==max_ep_len else d

# Store experience to replay buffer
replay_buffer.store(o, a, r, o2, d)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6

github_iconTop GitHub Comments

1reaction
araffincommented, Mar 23, 2022
1reaction
araffincommented, Sep 30, 2020

I created a branch on SB3 but it in fact a bit more tricky than expected (notably because VecEnv resets automatically): https://github.com/DLR-RM/stable-baselines3/compare/feat/remove-timelimit

For A2C/PPO or any n-step methods, we would need to keep track of two types of terminations signal…

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dynamic model-based filtering for mobile terminal location ...
The model of mo- bile terminal motion has a kinematic state space model describing the physical rules governing terminal motion and a control...
Read more >
Dynamic model-based filtering for mobile terminal location ...
A model-based dynamic filter is presented that uses an accurate model of mobile terminal motion to combine information from location ...
Read more >
Dynamic model-based filtering for mobile terminal ... - TSpace
The model of mo- bile terminal motion has a kinematic state space model describing the physical rules governing terminal motion and a control...
Read more >
Uncertainty and filtering of hidden Markov models in discrete ...
Filtering is a common problem in many applications. The essential concept is that there is an unseen Markov process, which influences the state...
Read more >
Neural Filtering - Scholarpedia
As neural filters are synthesized from realizations of the signal and measurement processes, they are applicable whether or not the mathematical ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found