question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

What is a DQN policy?

See original GitHub issue

In the module stable_baselines.deepq.policies, you have a lot of policies for DQN. This is all very nice, but Q-learning (and DQN) is an off-policy and value-based method, which means that we should be learning a state-action value function with a behaviour policy (like epsilon-ungreedy) that is derived from that value function (which, of course, does not need parameters, apart from the greek letter), not a policy, so what the heck do these policies represent? Are they actually the state-action value function Q(s, a) (in the case of the DQN, a neural network), or what? That’s the only thing in DQN that has parameters anyway, so I guess that all those “policies” there are actually neural networks that represent state-action value functions Q(s, a). Right? The documentation is as useless as it could ever be:

Policy object that implements a DQN policy

Yes, I did not look at the code yet.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:15

github_iconTop GitHub Comments

2reactions
araffincommented, Sep 16, 2021

Note: dqn only has a q network, so it would be: net_arch=[]

1reaction
Miffylicommented, Sep 16, 2021
Read more comments on GitHub >

github_iconTop Results From Across the Web

DQN — Stable Baselines 2.10.3a0 documentation
DQN Policies¶ · obs – (np.ndarray float or int) The current observation of the environment · state – (np.ndarray float) The last states...
Read more >
What is difference between DQN and Policy Gradient methods?
DQN is a form of Q-learning with function approximation (using a neural network ), ... In contrast, policy gradient methods try to learn...
Read more >
Deep Q Network vs Policy Gradients - Felix Yu
A close variant called Double DQN (DDQN) basically uses 2 neural networks to perform the Bellman iteration, one for generating the prediction ...
Read more >
DQN Explained - Papers With Code
It is usually used in conjunction with Experience Replay, for storing the episode steps in memory for off-policy learning, where samples are drawn...
Read more >
Chapter 4. Learning to pick the best policy: Policy gradient ...
What if we skip selecting a policy on top of the DQN and instead train a neural network to output an action directly?...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found