question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Feature Request] Proper TimeLimit/Infinite Horizon Handling for On-Policy algorithm

See original GitHub issue

🚀 Feature

Same as #284 but for on-policy algorithms. The current workaround is to use a TimeFeatureWrapper (cf. zoo).

### Checklist

  • I have checked that there is no similar issue in the repo (required)

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
Miffylicommented, Nov 5, 2021

you are refering to L380 and 381?

Yup, and I understood your idea. Turns out I was wrong ^^'. I only noticed it now that I tried to type it out.

1) Current setup when termination is encountered in next step

delta = self.rewards[step] + ~self.gamma * next_values * next_non_terminal~ - self.values[step] last_gae_lam = delta + ~self.gamma * self.gae_lambda * next_non_terminal * last_gae_lam~

2) Ideal setup where timeouts are handled correctly (next state is timeout termination)

delta = self.rewards[step] + self.gamma * next_values ~* next_non_terminal~ - self.values[step] <— bootstrap despite step is done last_gae_lam = delta + ~self.gamma * self.gae_lambda * next_non_terminal * last_gae_lam~ <— avoid leaking from next episode

3) Changing timeout reward to reward + next_value * gamma

delta = (reward + next_value * self.gamma) + ~self.gamma * next_values * next_non_terminal~ - self.values[step] <— same as in above example last_gae_lam = delta + ~self.gamma * self.gae_lambda * next_non_terminal * last_gae_lam~ <— avoid leaking from next episode

1reaction
Miffylicommented, Nov 5, 2021

so the conclusion is that my proposed hack is valid :p?

Yup, at least in this part of the code ^^. I would still rethink the whole process through carefully, as “hacks” like this often break something (and sadly it is hard to test).

Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found