question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Question about policy_loss

See original GitHub issue

Hi, thank you for your great work!!

I have a question related to #10. Can you explain the meaning of the code below in the GaussianPolicy??

# Enforcing Action Bound
log_prob -= torch.log(1 - action.pow(2) + epsilon)

Also, can you provide some information which you referenced to code this loss??

Anyway, thank you for sharing your great codes.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

4reactions
toshikwacommented, Oct 5, 2019

In gaussian policies, action is calculated like this.

  1. x ~ Gaussian(mean, std)
  2. y = tanh(x)
  3. a = k * y

Here is how to calculate log likelihood of action (to get entropy).

  1. likelihood of x is p(x) = 1 / (sqrt(2pi) * std) * exp(-1/(2std^2) * (x - mean)^2)

  2. likelihood of y is p(y) = p(x) / |dy/dx| = p(x) / (1 - tanh(x)^2) = p(x) / (1 - y^2)

  3. likelihood of a is p(a) = p(y) * |dy/da| = p(y) / k = p(x) / (k * (1 - y^2))

Finally, you get this. log(p(a)) = log(p(x)) - log(k*(1 - y^2))

0reactions
sunnyswagcommented, Oct 6, 2019

@ku2482 Ok, happy ending. 😃

Thank you !

Read more comments on GitHub >

github_iconTop Results From Across the Web

Industrial Mathematics Projects for High School Students ... - CIMS
The Question of Policy Loss > <. policyloss. The Problem. Many insurance companies provide discounts for multiple cars covered under the same policy....
Read more >
CRIMESHIELD ADVANCED POLICY LOSS PREVENTION ...
If you answered Yes to either of the questions above, please respond to all of the following: 1. Please provide a listing of...
Read more >
Why is the policy loss the mean of −Q(s,μ(s)) in the DDPG ...
I don't understand why the policy loss is simply the mean of −Q(s,μ(s)), where Q is the critic network and μ is the...
Read more >
Assessing Policy, Loss and Planning Combinations in ... - arXiv
... this architecture have shown that the best combination of planning algorithm, policy, and loss function is heavily problem dependent.
Read more >
Loss Control Questionnaire for Real Estate Management ...
Do the properties use independent firms to provide janitorial and other maintenance services? Yes. No. If Yes, are payments made by the local...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found