Cannot reproduce Breakout benchmark using Double DQN
See original GitHub issueI haven’t been able to reproduce the results of the Breakout benchmark with Double DQN when using similar hyperparameter values than the ones presented in the original paper. After more than 20M observed frames (~100.000 episodes), the mean 100 episode reward still remains around 10, having achieved a maximum value of 12.
I present in the following list the neural network configuration as well as the hyperparameter values that I’m using in case I’m missing or getting something important wrong:
env = gym.make("BreakoutNoFrameskip-v4")
env = ScaledFloatFrame(wrap_dqn(env))
model = deepq.models.cnn_to_mlp(
convs=[(32, 8, 4), (64, 4, 2), (64, 3, 1)],
hiddens=[512],
dueling=False
)
act = deepq.learn(
env,
q_func=model,
lr=25e-5,
max_timesteps=200000000,
buffer_size=100000, #cannot store 1M frames as the paper suggests
exploration_fraction=1000000/float(200000000), #so as to finish after !M steps
exploration_final_eps=0.1,
train_freq=4,
batch_size=32,
learning_starts=50000,
target_network_update_freq=10000,
gamma=0.99,
prioritized_replay=False
)
Does anyone have some idea of what is going wrong? The analogous results exposed in a jupyter notebook in openai/baselines-results
indicate that I should be able to get much better scores.
Thanks in advance.
Issue Analytics
- State:
- Created 6 years ago
- Reactions:6
- Comments:15 (1 by maintainers)
Top Results From Across the Web
Need some help with my Double DQN implementation which ...
I'm trying to replicate the Mnih et al. 2015/Double DQN results on Atari Breakout but the per-episode rewards (where one episode is a...
Read more >DQN — Stable Baselines3 1.7.0a5 documentation
This implementation provides only vanilla Deep Q-Learning and has no extensions such as Double-DQN, Dueling-DQN and Prioritized Experience Replay.
Read more >Revisiting Fundamentals of Experience Replay - arXiv
Experience replay is central to off-policy algo- rithms in deep reinforcement learning (RL), but there remain significant gaps in our ...
Read more >Stable-Baselines3: Reliable Reinforcement Learning ...
where defining and training a RL agent can be written in two lines of code: from stable_baselines3 import PPO # Train an agent...
Read more >Visual Pretraining for Deep Q-Learning - NTNU Open
This is done by training a convolutional neural network to play ... 5.2 Double DQN versus Frozen TL . ... 5.5 FrozenrandomweightforBreakout ....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@ashishm-io Another difference is the size of the replay buffer. You might try bumping that to 1e6, because by default it’s only 1e4. Note that in
run_atari.py
theScaledFloatFrame
wrapper is used, so 32-bit floats are used to store observations rather than 8-bit ints. In other words, you’ll need a ton of memory!@kdu4108 Yea, but Pong is the simplest of the Atari games as far as I know. In my implementation I achieve an average of over 20 in about 3 million frames. Breakout is significantly harder.
@btaba When you achieved the 250 average, that’s the actual score, right? As opposed to the clipped score? And also, is that with or without episodic life? In other words, is that an average of 250 in one life, or in 5 lives?
OpenAI team: How do we reproduce what’s reported in the baselines-results repository (https://github.com/openai/baselines-results/blob/master/dqn_results.ipynb)? It shows average scores of 400+; however, it references files that no longer exist, like
wang2015_eval.py
. I’m using therun_atari.py
script, with dueling off but otherwise default, and getting an average of just over 18 after 10M frames (the default). I’m trying to implement DQN, but most of the code I find online has subtle bugs. It’s important to have something out there to reference that has reproducible results!File “train.py”, line 244, in <module> start_time, start_steps = time.time(), info[‘steps’] KeyError: ‘steps’ How to get rid of this error when trying to run atari/train.py?