Question about policy_loss
See original GitHub issueHi, thank you for your great work!!
I have a question related to #10.
Can you explain the meaning of the code below in the GaussianPolicy
??
# Enforcing Action Bound
log_prob -= torch.log(1 - action.pow(2) + epsilon)
Also, can you provide some information which you referenced to code this loss??
Anyway, thank you for sharing your great codes.
Issue Analytics
- State:
- Created 4 years ago
- Comments:11 (5 by maintainers)
Top Results From Across the Web
Industrial Mathematics Projects for High School Students ... - CIMS
The Question of Policy Loss > <. policyloss. The Problem. Many insurance companies provide discounts for multiple cars covered under the same policy....
Read more >CRIMESHIELD ADVANCED POLICY LOSS PREVENTION ...
If you answered Yes to either of the questions above, please respond to all of the following: 1. Please provide a listing of...
Read more >Why is the policy loss the mean of −Q(s,μ(s)) in the DDPG ...
I don't understand why the policy loss is simply the mean of −Q(s,μ(s)), where Q is the critic network and μ is the...
Read more >Assessing Policy, Loss and Planning Combinations in ... - arXiv
... this architecture have shown that the best combination of planning algorithm, policy, and loss function is heavily problem dependent.
Read more >Loss Control Questionnaire for Real Estate Management ...
Do the properties use independent firms to provide janitorial and other maintenance services? Yes. No. If Yes, are payments made by the local...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
In gaussian policies, action is calculated like this.
Here is how to calculate log likelihood of action (to get entropy).
likelihood of x is p(x) = 1 / (sqrt(2pi) * std) * exp(-1/(2std^2) * (x - mean)^2)
likelihood of y is p(y) = p(x) / |dy/dx| = p(x) / (1 - tanh(x)^2) = p(x) / (1 - y^2)
likelihood of a is p(a) = p(y) * |dy/da| = p(y) / k = p(x) / (k * (1 - y^2))
Finally, you get this. log(p(a)) = log(p(x)) - log(k*(1 - y^2))
@ku2482 Ok, happy ending. 😃
Thank you !