[Feature Request] Double DQN
See original GitHub issue🚀 Feature
Add double variant of the dqn algorithm.
Motivation
It’s in the roadmap https://github.com/DLR-RM/stable-baselines3/issues/1.
Pitch
I suggest we go from:
with th.no_grad():
# Compute the next Q-values using the target network
next_q_values = self.q_net_target(replay_data.next_observations)
# Follow greedy policy: use the one with the highest value
next_q_values, _ = next_q_values.max(dim=1)
to:
with th.no_grad():
# Compute the next Q-values using the target network
next_q_values = self.q_net_target(replay_data.next_observations)
if self.double_dqn:
# use current model to select the action with maximal q value
max_actions = th.argmax(self.q_net(replay_data.next_observations), dim=1)
# evaluate q value of that action using fixed target network
next_q_values = th.gather(next_q_values, dim=1, index=max_actions.unsqueeze(-1))
else:
# Follow greedy policy: use the one with the highest value
next_q_values, _ = next_q_values.max(dim=1)
with double_dqn
as additional flag to be passed to DQN init.
### Checklist
- [ x] I have checked that there is no similar issue in the repo (required)
Issue Analytics
- State:
- Created 2 years ago
- Comments:18 (7 by maintainers)
Top Results From Across the Web
An Improved Dueling Deep Double-Q Network Based ... - MDPI
To address poor stability and slow convergence of the DQN algorithm in path planning problems, this paper proposes an Improved Dueling Double ......
Read more >Algorithms — Ray 2.2.0 - the Ray documentation
DD-PPO is best for envs that require GPUs to function, or if you need to scale out SGD to multiple nodes. ... double_q...
Read more >Constrained Deep Q-Learning Gradually Approaching ... - NCBI
DQN uses a convolutional neural network (CNN) to extract features from a screen and Q learning to learn game play. Considerable research has ......
Read more >Why does DQN require two different networks?
It is to do with stability of the Q-learning algorithm when using function approximation (i.e. the neural network). Using a separate target ...
Read more >Lecture 6: CNNs and Deep Q Learning =1With many slides for ...
But can require carefully hand designing that feature set. An alternative is to use a much richer function approximation class.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
As a lecturer in an AI program, I recommend our students to use Stable Baselines for their projects due to ease of use and clear documentation (thx for that!). From an educational point of Q-learning and DQN are a good introduction to RL, so students start off using DQN. Results using DQN of SB3 are much, much worse compared to SB2 (both with default values for the parameters). This hampers the adoption of SB3 (and the enthusiasm of students for RL). I have not yet understood/investigated the reason of this difference. Obvious candidates are the missing extensions like PER and DDQN, but of course this is an assumption. Goal of this comment is just to mention that progress in SB3 on this topic is much appreciated. If I can be of help, for example in testing improvements, let me know. Best regards, Erco Argante
Sorry for the long inactivity. I managed to run a few experiments with the proposed change on pong+breakout, I’ll leave here the training curves although I can’t notice much of a difference (at least that’s on par with original papers findings).
Vanilla DQN Pong
Vanilla DQN Breakout
Double DQN Pong
Double DQN Breakout