question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

LSTM policies are broken for PPO1 and TRPO

See original GitHub issue

See the feature/fix_lstm branch for a test which fails for the above mentioned algorithms.

For PPO1 and TRPO the cause seems to be that the batch size is not provided to the policy (None is passed). Then the ortho-initializer has issues.

For PPO2 the assert in line 109 fails:

assert self.n_envs % self.nminibatches == 0, "For recurrent policies, "\
                        "the number of environments run in parallel should be a multiple of nminibatches."

Since the PPO2 instance is created using PPO2(policy, 'CartPole-v1'), the default parameters of PPO2 seem to be broken somehow?

For ACKTR the issue is somewhere in get_factors of kfac.py and I have no clue what that does and what goes wrong there but it complains about some shared nodes among different computation ops.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:3
  • Comments:10 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
HareshKarnancommented, Feb 20, 2019

Any updates on this ? TRPO still doesn’t support MlpLstmPolicy 😢

1reaction
araffincommented, Mar 6, 2019

@HareshMiriyala for now, we don’t have time to fix that (even though it is on the roadmap). Currently, we are working on fixing GAIL + A2C, this will be merged with master soon.

However, we appreciate contributions, especially to fix that kind of thing 😉

Read more comments on GitHub >

github_iconTop Results From Across the Web

Changelog — Stable Baselines 2.2.0 documentation
... refactored ACER, DDPG, GAIL, PPO1 and TRPO to fit with A2C, PPO2 and ACKTR policies; added new policies for most algorithms (Mlp,...
Read more >
Stable Baselines Documentation - Read the Docs
If you can use MPI, then you can choose between PPO1, TRPO and DDPG. ... LSTM is shared between value network and policy...
Read more >
Stale hidden states in PPO-LSTM - Kamal
Algorithms like Trust Region Policy Optimization (TRPO, Schulman et al. 2015) and PPO offer ways to update agent parameters while keeping ...
Read more >
stable-baselines Changelog - pyup.io
added support for LSTM model recording to `generate_expert_traj` (XMaster96) ... fixed PPO1 and TRPO done values for recurrent policies
Read more >
The 37 Implementation Details of Proximal Policy Optimization
“Ehh… I haven't read too much on PPO + LSTM” Jon admitted. ... As a refinement to Trust Region Policy Optimization (TRPO) (Schulman...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found