Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Clarify which policies share weights between policy and value network.

See original GitHub issue

The documentation does not state explicitly whether the default policies in common.policies share weights between the value network and the policy network. After carefully reading the code, I could deduce that the

LstmPolicy shares all weights between value and policy network except for the very last linear layer.
FeedForwardPolicy shares weights if the ‘cnn’ extractor is used but uses two entirely different data streams if the ‘mlp’ extractor is used.

Are there any justifications for this specific setup?

Issue Analytics

State:
Created 5 years ago
Comments:18 (1 by maintainers)

Top GitHub Comments

2reactions

ernestumcommented, Dec 3, 2018

Uhhh I can hear a PR rumbling in the distance …

1reaction

ernestumcommented, Nov 27, 2018

I like the intermediary approach. It takes away the option to “rejoin” the two data streams which was probably useless in the first place. I might start to work on it when I get around to it.

Read more comments on GitHub >

Top Results From Across the Web

Difference between AlphaGo's policy network and value network

The policy network was used to reduce the breadth of the search from a node (guiding ... This time its weights were updated...

Policy Networks vs Value Networks in Reinforcement Learning

Policy and Value Networks are used together in algorithms like Monte Carlo Tree Search to perform Reinforcement Learning.

What exactly is meant by shared weights in convolutional ...

Shared weights basically means that the same weights is used for two layers in the model. This basically means that the same parameters...

What is the significance of shared layers between the actor ...

I was looking into many implementations of PPO and in many of the cases the actor and critic share many layers of neural...

Policy Evaluation Networks - arXiv

agent to generalize its value representation among different policies, by providing a policy description as input. We hypothesize that an agent ...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

[feature request] custom transformation of observation space

Passing parameters to the policy is hard