Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

LSTM policies are broken for PPO1 and TRPO

See original GitHub issue

See the feature/fix_lstm branch for a test which fails for the above mentioned algorithms.

For PPO1 and TRPO the cause seems to be that the batch size is not provided to the policy (None is passed). Then the ortho-initializer has issues.

For PPO2 the assert in line 109 fails:

assert self.n_envs % self.nminibatches == 0, "For recurrent policies, "\
                        "the number of environments run in parallel should be a multiple of nminibatches."

Since the PPO2 instance is created using PPO2(policy, 'CartPole-v1'), the default parameters of PPO2 seem to be broken somehow?

For ACKTR the issue is somewhere in get_factors of kfac.py and I have no clue what that does and what goes wrong there but it complains about some shared nodes among different computation ops.

Issue Analytics

State:
Created 5 years ago
Reactions:3
Comments:10 (1 by maintainers)

Top GitHub Comments

2reactions

HareshKarnancommented, Feb 20, 2019

Any updates on this ? TRPO still doesn’t support MlpLstmPolicy 😢

1reaction

araffincommented, Mar 6, 2019

@HareshMiriyala for now, we don’t have time to fix that (even though it is on the roadmap). Currently, we are working on fixing GAIL + A2C, this will be merged with master soon.

However, we appreciate contributions, especially to fix that kind of thing 😉