question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DQN is not converging even after 15M timesteps

See original GitHub issue

Question

I am training Pong-v4/PongNoFrameskip-v4 with DQN. It gives me around ~(-20 to -21) even after 1.5e7 timesteps. I tried various parameters for DQN still it gives me the same output. I could not find proper hyper-parameters for DQN. I think it is a problem with DQN.

Additional context

In the beginning Training of agent start with around -20.4 to -20.2. After 3e6 timesteps it reaches -21 and then it fluctuates in a range between -20.8 to -21.

I tried the following varieties of DQN in which I experimented with different combination of following: learning_starts in [default-50k,5k,100k], gamma [0.98,0.99,0.999], exploration_final_eps [0.02,0.05], learning_rate [1e-3,1e-4,5e-4] and buffer_size [50k,500k,1000k].

Above combination is applied into below code.

model = DQN('CnnPolicy',env,verbose=1,learning_starts=50000,gamma=0.98,exploration_final_eps=0.02,learning_rate=1e-3)
model.learn(total_timesteps=int(1.5e7),log_interval=10)

Since I already tried the above mentions combinations, I tend to think to have a bug in DQN implementation.

Checklist

  • I have read the documentation (required)
  • I have checked that there is no similar issue in the repo (required)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
Miffylicommented, Nov 3, 2020

Have you tried using the parameters and/or other code from the zoo repository? I used parameters from SB2-zoo (without priorization/dueling/etc) recently when matching the performance and things worked out as expected (see #110).

0reactions
araffincommented, Apr 5, 2021

I have rerun my experiments with different seeds and see a weird result.

See doc “tips and tricks” and “reproducibility”:

One thing you can do is augment the replay buffer size to 1e5 or 1e6 (if it fits in your RAM) (I think I may have forgotten to set it back to higher value, even though it seems to work in most cases, cf benchmark).

Read more comments on GitHub >

github_iconTop Results From Across the Web

DQN - Q-Loss not converging - Stack Overflow
I think it's normal that the Q-loss is not converging as your data keeps changing when your policy updates.
Read more >
Why does the DQN not converge when the start or goal states ...
I can easily get the DQN to converge on a static environment, but I am having trouble with a dynamic environment where the...
Read more >
Rainbow DQN — The Best Reinforcement Learning Has to ...
Even if rewards are noisy and do not directly aid in a converging expected value, we can still utilize them to get a...
Read more >
DQN model won't converge : r/reinforcementlearning - Reddit
My model won't converge (I suspect it's because I'm not batch training but I'm not sure) and I wanted to get some inputs...
Read more >
Improving the efficiency of reinforcement learning for a ...
The fundamental disadvantage of using NNs within RL is that there is no proof of a guaranteed convergence as there is with tabular...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found