DQN is not converging even after 15M timesteps
See original GitHub issueQuestion
I am training Pong-v4/PongNoFrameskip-v4 with DQN. It gives me around ~(-20 to -21) even after 1.5e7 timesteps. I tried various parameters for DQN still it gives me the same output. I could not find proper hyper-parameters for DQN. I think it is a problem with DQN.
Additional context
In the beginning Training of agent start with around -20.4 to -20.2. After 3e6 timesteps it reaches -21 and then it fluctuates in a range between -20.8 to -21.
I tried the following varieties of DQN in which I experimented with different combination of following: learning_starts in [default-50k,5k,100k], gamma [0.98,0.99,0.999], exploration_final_eps [0.02,0.05], learning_rate [1e-3,1e-4,5e-4] and buffer_size [50k,500k,1000k].
Above combination is applied into below code.
model = DQN('CnnPolicy',env,verbose=1,learning_starts=50000,gamma=0.98,exploration_final_eps=0.02,learning_rate=1e-3)
model.learn(total_timesteps=int(1.5e7),log_interval=10)
Since I already tried the above mentions combinations, I tend to think to have a bug in DQN implementation.
Checklist
- I have read the documentation (required)
- I have checked that there is no similar issue in the repo (required)
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (4 by maintainers)
Top GitHub Comments
Have you tried using the parameters and/or other code from the zoo repository? I used parameters from SB2-zoo (without priorization/dueling/etc) recently when matching the performance and things worked out as expected (see #110).
See doc “tips and tricks” and “reproducibility”:
One thing you can do is augment the replay buffer size to 1e5 or 1e6 (if it fits in your RAM) (I think I may have forgotten to set it back to higher value, even though it seems to work in most cases, cf benchmark).