Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot reproduce Breakout benchmark using Double DQN

See original GitHub issue

I haven’t been able to reproduce the results of the Breakout benchmark with Double DQN when using similar hyperparameter values than the ones presented in the original paper. After more than 20M observed frames (~100.000 episodes), the mean 100 episode reward still remains around 10, having achieved a maximum value of 12.

I present in the following list the neural network configuration as well as the hyperparameter values that I’m using in case I’m missing or getting something important wrong:

env = gym.make("BreakoutNoFrameskip-v4")
env = ScaledFloatFrame(wrap_dqn(env))
model = deepq.models.cnn_to_mlp(
        convs=[(32, 8, 4), (64, 4, 2), (64, 3, 1)],
        hiddens=[512],
        dueling=False
)
act = deepq.learn(
        env,
        q_func=model,
        lr=25e-5,
        max_timesteps=200000000,
        buffer_size=100000, #cannot store 1M frames as the paper suggests
        exploration_fraction=1000000/float(200000000), #so as to finish after !M steps
        exploration_final_eps=0.1,
        train_freq=4,
        batch_size=32,
        learning_starts=50000,
        target_network_update_freq=10000,
        gamma=0.99,
        prioritized_replay=False
)

Does anyone have some idea of what is going wrong? The analogous results exposed in a jupyter notebook in openai/baselines-results indicate that I should be able to get much better scores.

Thanks in advance.

Issue Analytics

State:
Created 6 years ago
Reactions:6
Comments:15 (1 by maintainers)

Top GitHub Comments

4reactions

benbottocommented, May 14, 2018

@ashishm-io Another difference is the size of the replay buffer. You might try bumping that to 1e6, because by default it’s only 1e4. Note that in run_atari.py the ScaledFloatFrame wrapper is used, so 32-bit floats are used to store observations rather than 8-bit ints. In other words, you’ll need a ton of memory!

@kdu4108 Yea, but Pong is the simplest of the Atari games as far as I know. In my implementation I achieve an average of over 20 in about 3 million frames. Breakout is significantly harder.

@btaba When you achieved the 250 average, that’s the actual score, right? As opposed to the clipped score? And also, is that with or without episodic life? In other words, is that an average of 250 in one life, or in 5 lives?

OpenAI team: How do we reproduce what’s reported in the baselines-results repository (https://github.com/openai/baselines-results/blob/master/dqn_results.ipynb)? It shows average scores of 400+; however, it references files that no longer exist, like wang2015_eval.py. I’m using the run_atari.py script, with dueling off but otherwise default, and getting an average of just over 18 after 10M frames (the default). I’m trying to implement DQN, but most of the code I find online has subtle bugs. It’s important to have something out there to reference that has reproducible results!

3reactions

BNSnehacommented, Nov 25, 2017

File “train.py”, line 244, in <module> start_time, start_steps = time.time(), info[‘steps’] KeyError: ‘steps’ How to get rid of this error when trying to run atari/train.py?