What is a DQN policy?
See original GitHub issueIn the module stable_baselines.deepq.policies
, you have a lot of policies for DQN. This is all very nice, but Q-learning (and DQN) is an off-policy and value-based method, which means that we should be learning a state-action value function with a behaviour policy (like epsilon-ungreedy) that is derived from that value function (which, of course, does not need parameters, apart from the greek letter), not a policy, so what the heck do these policies represent? Are they actually the state-action value function Q(s, a) (in the case of the DQN, a neural network), or what? That’s the only thing in DQN that has parameters anyway, so I guess that all those “policies” there are actually neural networks that represent state-action value functions Q(s, a). Right? The documentation is as useless as it could ever be:
Policy object that implements a DQN policy
Yes, I did not look at the code yet.
Issue Analytics
- State:
- Created 3 years ago
- Comments:15
Top GitHub Comments
Note: dqn only has a q network, so it would be:
net_arch=[]
@sheila-janota Yup!