Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

What is a DQN policy?

See original GitHub issue

In the module stable_baselines.deepq.policies, you have a lot of policies for DQN. This is all very nice, but Q-learning (and DQN) is an off-policy and value-based method, which means that we should be learning a state-action value function with a behaviour policy (like epsilon-ungreedy) that is derived from that value function (which, of course, does not need parameters, apart from the greek letter), not a policy, so what the heck do these policies represent? Are they actually the state-action value function Q(s, a) (in the case of the DQN, a neural network), or what? That’s the only thing in DQN that has parameters anyway, so I guess that all those “policies” there are actually neural networks that represent state-action value functions Q(s, a). Right? The documentation is as useless as it could ever be:

Policy object that implements a DQN policy

Yes, I did not look at the code yet.

Issue Analytics

State:
Created 3 years ago
Comments:15

Top GitHub Comments

2reactions

araffincommented, Sep 16, 2021

Note: dqn only has a q network, so it would be: net_arch=[]

1reaction

Miffylicommented, Sep 16, 2021

@sheila-janota Yup!

Read more comments on GitHub >

Top Results From Across the Web

DQN — Stable Baselines 2.10.3a0 documentation

DQN Policies¶ · obs – (np.ndarray float or int) The current observation of the environment · state – (np.ndarray float) The last states...

What is difference between DQN and Policy Gradient methods?

DQN is a form of Q-learning with function approximation (using a neural network ), ... In contrast, policy gradient methods try to learn...

Deep Q Network vs Policy Gradients - Felix Yu

A close variant called Double DQN (DDQN) basically uses 2 neural networks to perform the Bellman iteration, one for generating the prediction ...

DQN Explained - Papers With Code

It is usually used in conjunction with Experience Replay, for storing the episode steps in memory for off-policy learning, where samples are drawn...

Chapter 4. Learning to pick the best policy: Policy gradient ...

What if we skip selecting a policy on top of the DQN and instead train a neural network to output an action directly?...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

Issue with MlpLstm policy

Cannot evaluate if trained using more than 1 env [Custom env (Unity)]