question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot reproduce Breakout benchmark using Double DQN

See original GitHub issue

I haven’t been able to reproduce the results of the Breakout benchmark with Double DQN when using similar hyperparameter values than the ones presented in the original paper. After more than 20M observed frames (~100.000 episodes), the mean 100 episode reward still remains around 10, having achieved a maximum value of 12.

I present in the following list the neural network configuration as well as the hyperparameter values that I’m using in case I’m missing or getting something important wrong:

env = gym.make("BreakoutNoFrameskip-v4")
env = ScaledFloatFrame(wrap_dqn(env))
model = deepq.models.cnn_to_mlp(
        convs=[(32, 8, 4), (64, 4, 2), (64, 3, 1)],
        hiddens=[512],
        dueling=False
)
act = deepq.learn(
        env,
        q_func=model,
        lr=25e-5,
        max_timesteps=200000000,
        buffer_size=100000, #cannot store 1M frames as the paper suggests
        exploration_fraction=1000000/float(200000000), #so as to finish after !M steps
        exploration_final_eps=0.1,
        train_freq=4,
        batch_size=32,
        learning_starts=50000,
        target_network_update_freq=10000,
        gamma=0.99,
        prioritized_replay=False
)

Does anyone have some idea of what is going wrong? The analogous results exposed in a jupyter notebook in openai/baselines-results indicate that I should be able to get much better scores.

Thanks in advance.

Issue Analytics

  • State:open
  • Created 6 years ago
  • Reactions:6
  • Comments:15 (1 by maintainers)

github_iconTop GitHub Comments

4reactions
benbottocommented, May 14, 2018

@ashishm-io Another difference is the size of the replay buffer. You might try bumping that to 1e6, because by default it’s only 1e4. Note that in run_atari.py the ScaledFloatFrame wrapper is used, so 32-bit floats are used to store observations rather than 8-bit ints. In other words, you’ll need a ton of memory!

@kdu4108 Yea, but Pong is the simplest of the Atari games as far as I know. In my implementation I achieve an average of over 20 in about 3 million frames. Breakout is significantly harder.

@btaba When you achieved the 250 average, that’s the actual score, right? As opposed to the clipped score? And also, is that with or without episodic life? In other words, is that an average of 250 in one life, or in 5 lives?

OpenAI team: How do we reproduce what’s reported in the baselines-results repository (https://github.com/openai/baselines-results/blob/master/dqn_results.ipynb)? It shows average scores of 400+; however, it references files that no longer exist, like wang2015_eval.py. I’m using the run_atari.py script, with dueling off but otherwise default, and getting an average of just over 18 after 10M frames (the default). I’m trying to implement DQN, but most of the code I find online has subtle bugs. It’s important to have something out there to reference that has reproducible results!

3reactions
BNSnehacommented, Nov 25, 2017

File “train.py”, line 244, in <module> start_time, start_steps = time.time(), info[‘steps’] KeyError: ‘steps’ How to get rid of this error when trying to run atari/train.py?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Need some help with my Double DQN implementation which ...
I'm trying to replicate the Mnih et al. 2015/Double DQN results on Atari Breakout but the per-episode rewards (where one episode is a...
Read more >
DQN — Stable Baselines3 1.7.0a5 documentation
This implementation provides only vanilla Deep Q-Learning and has no extensions such as Double-DQN, Dueling-DQN and Prioritized Experience Replay.
Read more >
Revisiting Fundamentals of Experience Replay - arXiv
Experience replay is central to off-policy algo- rithms in deep reinforcement learning (RL), but there remain significant gaps in our ...
Read more >
Stable-Baselines3: Reliable Reinforcement Learning ...
where defining and training a RL agent can be written in two lines of code: from stable_baselines3 import PPO # Train an agent...
Read more >
Visual Pretraining for Deep Q-Learning - NTNU Open
This is done by training a convolutional neural network to play ... 5.2 Double DQN versus Frozen TL . ... 5.5 FrozenrandomweightforBreakout ....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found