question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support learn() with total timesteps less than episode length

See original GitHub issue

In an imitation learning project, I’m alternating short calls (low total_timesteps) to PPO2.learn() with gradient descent updates to the discriminator among other things.

The PPO2 updates themselves seem to be working fine. However, logging has a few problems because some logging state is not kept between different calls to learn(). Despite the fact that I use .learn(reset_num_timesteps=False), which seems to exist to allow logging between multiple calls to PPO2.learn(), we run these problems:

(1) We initialize a new Runner every time we call learn, even if reset_num_timesteps=False. This forces the environment to reset(), thus biasing the logger towards reporting the reward and episode length means of shorter episodes (longer episodes don’t get to finish by the end of the training loop). As an example, when training CartPole, my expert imitation policies (mean return: 500) often show mean return around 300 in my PPO2 logs.

(2) The ep_info_buf is reset every time we call learn instead of being saved. This makes the training curve more jagged and again biased toward shorter episodes at the beginning of each new call to learn().

I’m wondering if the maintainers would be interested in some sort of PR that fixes these logging discrepancies.

I propose that reset_num_timesteps=False (perhaps rename to reset_log_state) should make PPO2 keep state from (1) and (2).

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:16

github_iconTop GitHub Comments

1reaction
AdamGleavecommented, Dec 18, 2019

After talking to @shwang the problem is more serious than I thought: it affects learning not just logging. AbstractEnvRunner calls env.reset in the __init__ method. So any algorithm using Runner when you call learn with fewer timesteps than the episode length will never see the episode end during training! This would cause big problems in environments with sparse reward at the end of an episode.

Admittedly this use case is rare so it’s fairly low-severity, but think we should treat learning breaking in this setting as a bug.

0reactions
araffincommented, Jul 16, 2020

This should be fixed in v3, but needs to be checked, linking https://github.com/DLR-RM/stable-baselines3/issues/1

Read more comments on GitHub >

github_iconTop Results From Across the Web

Understanding the total_timesteps parameter in stable ...
According to the stable-baselines source code. total_timesteps is the number of steps in total the agent will do for any environment.
Read more >
Reinforcement Learning in Python with Stable Baselines 3
This allows us to see the actual total number of timesteps for the model rather than resetting every iteration. We're also setting a...
Read more >
Reinforcement Learning Tips and Tricks - Stable Baselines
The aim of this section is to help you doing reinforcement learning ... Looking at the training curve (episode reward function of the...
Read more >
How to model episodic task with pre-determined total time?
I want to model a problem as an MDP and solve it with reinforcement learning algorithms. Suppose that the problem is episodic and...
Read more >
Stable Baselines Documentation - Read the Docs
You can find a migration guide in SB3 documentation. 1.3 Reinforcement Learning Tips and Tricks. The aim of this section is to help...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found