Basic CnnLstm policy not working with PPO on Atari Pong
See original GitHub issueBug description Simply changing the policy from CnnPolicy to CnnLstmPolicy when training PPO2 on Atari Pong makes training fail. Using the standard CnnPolicy the training reaches around max performance in 10M steps.
Code Here is the code:
import os
import gym
import numpy as np
import matplotlib.pyplot as plt
from stable_baselines.common.policies import CnnLstmPolicy
from stable_baselines import PPO2
from stable_baselines.common.cmd_util import make_atari_env
from stable_baselines.common.evaluation import evaluate_policy
env = make_atari_env('PongNoFrameskip-v4', num_env=1, seed=0, wrapper_kwargs = {"frame_stack": False})
model = PPO2(CnnLstmPolicy, env, nminibatches=1, verbose=1, tensorboard_log="ppo2_atari_comparison")
# Train the agent
time_steps = 10000000
model.learn(total_timesteps=time_steps)
Additional notes
- Please note the result is the same if one both stacks frames or doesn’t
- Do you have any hint to address this? On such simple tests it shouldn’t be a matter of hyperparameters…
Issue Analytics
- State:
- Created 3 years ago
- Comments:11
Top Results From Across the Web
A Graphic Guide to Implementing PPO for Atari Games
Learning how Proximal Policy Optimisation (PPO) works and writing a functioning version is hard. There are many places where this can go ...
Read more >Learning to play Pong using PPO in PyTorch
The rules of Atari Pong are simple enough. You get a point if you put the ball past your opponent, and your opponent...
Read more >Stable Baselines Documentation - Read the Docs
When applying RL to a custom problem, you should always normalize the input ... As some policy are stochastic by default (e.g. A2C...
Read more >Dealing with Sparse Rewards in Reinforcement Learning - arXiv
A base PPO policy with RND is combined with all auxiliary tasks as described in UNREAL-A2C2 implementation, in order to stop the intrinsic ......
Read more >Atari 2600: Pong with PPO — coax 0.1.11 documentation
In this notebook we solve the Pong environment using a TD actor-critic algorithm with PPO policy updates. We use convolutional neural nets (without...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
For Atari and PPO specifically, here (obtained with some hyperparameter search, I believe).
Without frame-stacking:
I’m using hyperparams from the zoo (cf doc)
not really, original ppo does not have such feature. And by experience, it does not help that much.
I guess we can close this issue?