Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Question about policy_loss

See original GitHub issue

Hi, thank you for your great work!!

I have a question related to #10. Can you explain the meaning of the code below in the GaussianPolicy??

# Enforcing Action Bound
log_prob -= torch.log(1 - action.pow(2) + epsilon)

Also, can you provide some information which you referenced to code this loss??

Anyway, thank you for sharing your great codes.

Issue Analytics

State:
Created 4 years ago
Comments:11 (5 by maintainers)

Top GitHub Comments

4reactions

toshikwacommented, Oct 5, 2019

In gaussian policies, action is calculated like this.

x ~ Gaussian(mean, std)
y = tanh(x)
a = k * y

Here is how to calculate log likelihood of action (to get entropy).

likelihood of x is p(x) = 1 / (sqrt(2pi) * std) * exp(-1/(2std^2) * (x - mean)^2)
likelihood of y is p(y) = p(x) / |dy/dx| = p(x) / (1 - tanh(x)^2) = p(x) / (1 - y^2)
likelihood of a is p(a) = p(y) * |dy/da| = p(y) / k = p(x) / (k * (1 - y^2))

Finally, you get this. log(p(a)) = log(p(x)) - log(k*(1 - y^2))

0reactions

sunnyswagcommented, Oct 6, 2019

@ku2482 Ok, happy ending. 😃

Thank you !

Top Results From Across the Web

Industrial Mathematics Projects for High School Students ... - CIMS

The Question of Policy Loss > <. policyloss. The Problem. Many insurance companies provide discounts for multiple cars covered under the same policy....

CRIMESHIELD ADVANCED POLICY LOSS PREVENTION ...

If you answered Yes to either of the questions above, please respond to all of the following: 1. Please provide a listing of...

Why is the policy loss the mean of −Q(s,μ(s)) in the DDPG ...

I don't understand why the policy loss is simply the mean of −Q(s,μ(s)), where Q is the critic network and μ is the...

Assessing Policy, Loss and Planning Combinations in ... - arXiv

... this architecture have shown that the best combination of planning algorithm, policy, and loss function is heavily problem dependent.

Loss Control Questionnaire for Real Estate Management ...

Do the properties use independent firms to provide janitorial and other maintenance services? Yes. No. If Yes, are payments made by the local...