Clarify which policies share weights between policy and value network.
See original GitHub issueThe documentation does not state explicitly whether the default policies in common.policies
share weights between the value network and the policy network. After carefully reading the code, I could deduce that the
LstmPolicy
shares all weights between value and policy network except for the very last linear layer.FeedForwardPolicy
shares weights if the ‘cnn’ extractor is used but uses two entirely different data streams if the ‘mlp’ extractor is used.
Are there any justifications for this specific setup?
Issue Analytics
- State:
- Created 5 years ago
- Comments:18 (1 by maintainers)
Top Results From Across the Web
Difference between AlphaGo's policy network and value network
The policy network was used to reduce the breadth of the search from a node (guiding ... This time its weights were updated...
Read more >Policy Networks vs Value Networks in Reinforcement Learning
Policy and Value Networks are used together in algorithms like Monte Carlo Tree Search to perform Reinforcement Learning.
Read more >What exactly is meant by shared weights in convolutional ...
Shared weights basically means that the same weights is used for two layers in the model. This basically means that the same parameters...
Read more >What is the significance of shared layers between the actor ...
I was looking into many implementations of PPO and in many of the cases the actor and critic share many layers of neural...
Read more >Policy Evaluation Networks - arXiv
agent to generalize its value representation among different policies, by providing a policy description as input. We hypothesize that an agent ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Uhhh I can hear a PR rumbling in the distance …
I like the intermediary approach. It takes away the option to “rejoin” the two data streams which was probably useless in the first place. I might start to work on it when I get around to it.