question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Feature Request] Better KL Divergence Approximation

See original GitHub issue

Important Note: We do not do technical support, nor consulting and don’t answer personal questions per email. Please post your question on the RL Discord, Reddit or Stack Overflow in that case.

🚀 Feature

Use a better estimate for the KL Divergence in the PPO algorithm.

The estimator I propose is the reverse estimator here:
image

This has a lower variance since the extra term is negatively correlated and also is positive semi-definite (see motivation section).

Motivation

In the PPO algorithm, there is a KL limit method used as a final block to large updates to the policy in a single timestep. The line used is: approx_kl_divs.append(th.mean(rollout_data.old_log_prob - log_prob).detach().cpu().numpy()). This is an unbiased estimator, but it has large variance since it can take on negative values (as opposed to the actual KL Divergence measure)! This can cause problems in the check: if self.target_kl is not None and np.mean(approx_kl_divs) > 1.5 * self.target_kl: since this doesn’t consider the large negative values that approx_kl_divs could take.

Pitch

I want to replace the KL Divergence equation currently used in the PPO algorithm with the better approximation described above.

NOTE: This is basically a 1 line change.

Alternatives

N/A

Additional context

image

### Checklist

  • I have checked that there is no similar issue in the repo (required)

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:13 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
09tangrirocommented, May 4, 2021

See this article for some good stuff on forward vs reverse KL: https://dibyaghosh.com/blog/probability/kldivergence.html

2reactions
09tangrirocommented, May 3, 2021

Sure! Here’s a blog post on the topic by John Schulman: http://joschu.net/blog/kl-approx.html

Read more comments on GitHub >

github_iconTop Results From Across the Web

Approximating KL Divergence | by Rohan Tangri
The KL Divergence is a measure of the dissimilarity between a 'true' distribution and a 'prediction' distribution. The 'true' distribution, p(x) ...
Read more >
How to Calculate the KL Divergence for Machine Learning
The better our approximation, the less additional information is required. … the KL divergence is the average number of extra bits needed to ......
Read more >
Estimating Kullback-Leibler Divergence Using Kernel Machines
The method uses the Donsker-Varadhan representation to arrive at the estimate of the KL divergence and is better than the existing estimators ...
Read more >
KL-divergence as an objective function — Graduate Descent
Exclusive KL is generally regarded as "an approximation" to inclusive KL. This bias in this approximation can be quite large. Inclusive vs.
Read more >
How Good Are Low-Rank Approximations in Gaussian ...
bound the Kullback–Leibler divergence between an exact GP and one resulting from one of the ... tees require that the input feature vectors...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found