question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Feature Request] Double DQN

See original GitHub issue

🚀 Feature

Add double variant of the dqn algorithm.

Motivation

It’s in the roadmap https://github.com/DLR-RM/stable-baselines3/issues/1.

Pitch

I suggest we go from:

with th.no_grad():
      # Compute the next Q-values using the target network
      next_q_values = self.q_net_target(replay_data.next_observations)
      # Follow greedy policy: use the one with the highest value
      next_q_values, _ = next_q_values.max(dim=1)

to:

with th.no_grad():
      # Compute the next Q-values using the target network
      next_q_values = self.q_net_target(replay_data.next_observations)
      if self.double_dqn:
          # use current model to select the action with maximal q value
          max_actions = th.argmax(self.q_net(replay_data.next_observations), dim=1)
          # evaluate q value of that action using fixed target network
          next_q_values = th.gather(next_q_values, dim=1, index=max_actions.unsqueeze(-1))
      else:
          # Follow greedy policy: use the one with the highest value
          next_q_values, _ = next_q_values.max(dim=1)

with double_dqn as additional flag to be passed to DQN init.

### Checklist

  • [ x] I have checked that there is no similar issue in the repo (required)

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:18 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
ercoargantecommented, Oct 22, 2021

As a lecturer in an AI program, I recommend our students to use Stable Baselines for their projects due to ease of use and clear documentation (thx for that!). From an educational point of Q-learning and DQN are a good introduction to RL, so students start off using DQN. Results using DQN of SB3 are much, much worse compared to SB2 (both with default values for the parameters). This hampers the adoption of SB3 (and the enthusiasm of students for RL). I have not yet understood/investigated the reason of this difference. Obvious candidates are the missing extensions like PER and DDQN, but of course this is an assumption. Goal of this comment is just to mention that progress in SB3 on this topic is much appreciated. If I can be of help, for example in testing improvements, let me know. Best regards, Erco Argante

2reactions
NickLucchecommented, Sep 16, 2021

Sorry for the long inactivity. I managed to run a few experiments with the proposed change on pong+breakout, I’ll leave here the training curves although I can’t notice much of a difference (at least that’s on par with original papers findings).

Vanilla DQN Pong

pong_Training_Episodic_Reward

Vanilla DQN Breakout

breakout_Training_Episodic_Reward

Double DQN Pong

double_pong_Training_Episodic_Reward

Double DQN Breakout

breakout_double_Training_Episodic_Reward

Read more comments on GitHub >

github_iconTop Results From Across the Web

An Improved Dueling Deep Double-Q Network Based ... - MDPI
To address poor stability and slow convergence of the DQN algorithm in path planning problems, this paper proposes an Improved Dueling Double ......
Read more >
Algorithms — Ray 2.2.0 - the Ray documentation
DD-PPO is best for envs that require GPUs to function, or if you need to scale out SGD to multiple nodes. ... double_q...
Read more >
Constrained Deep Q-Learning Gradually Approaching ... - NCBI
DQN uses a convolutional neural network (CNN) to extract features from a screen and Q learning to learn game play. Considerable research has ......
Read more >
Why does DQN require two different networks?
It is to do with stability of the Q-learning algorithm when using function approximation (i.e. the neural network). Using a separate target ...
Read more >
Lecture 6: CNNs and Deep Q Learning =1With many slides for ...
But can require carefully hand designing that feature set. An alternative is to use a much richer function approximation class.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found