Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Feature Request] Better KL Divergence Approximation

See original GitHub issue

Important Note: We do not do technical support, nor consulting and don’t answer personal questions per email. Please post your question on the RL Discord, Reddit or Stack Overflow in that case.

🚀 Feature

Use a better estimate for the KL Divergence in the PPO algorithm.

The estimator I propose is the reverse estimator here:

This has a lower variance since the extra term is negatively correlated and also is positive semi-definite (see motivation section).

Motivation

In the PPO algorithm, there is a KL limit method used as a final block to large updates to the policy in a single timestep. The line used is: approx_kl_divs.append(th.mean(rollout_data.old_log_prob - log_prob).detach().cpu().numpy()). This is an unbiased estimator, but it has large variance since it can take on negative values (as opposed to the actual KL Divergence measure)! This can cause problems in the check: if self.target_kl is not None and np.mean(approx_kl_divs) > 1.5 * self.target_kl: since this doesn’t consider the large negative values that approx_kl_divs could take.

Pitch

I want to replace the KL Divergence equation currently used in the PPO algorithm with the better approximation described above.

NOTE: This is basically a 1 line change.

Alternatives

N/A

Additional context

### Checklist

I have checked that there is no similar issue in the repo (required)

Issue Analytics

State:
Created 2 years ago
Comments:13 (8 by maintainers)

Top GitHub Comments

2reactions

09tangrirocommented, May 4, 2021

See this article for some good stuff on forward vs reverse KL: https://dibyaghosh.com/blog/probability/kldivergence.html

2reactions

09tangrirocommented, May 3, 2021

Sure! Here’s a blog post on the topic by John Schulman: http://joschu.net/blog/kl-approx.html

Read more comments on GitHub >

Top Results From Across the Web

Approximating KL Divergence | by Rohan Tangri

The KL Divergence is a measure of the dissimilarity between a 'true' distribution and a 'prediction' distribution. The 'true' distribution, p(x) ...

How to Calculate the KL Divergence for Machine Learning

The better our approximation, the less additional information is required. … the KL divergence is the average number of extra bits needed to ......

Estimating Kullback-Leibler Divergence Using Kernel Machines

The method uses the Donsker-Varadhan representation to arrive at the estimate of the KL divergence and is better than the existing estimators ...

KL-divergence as an objective function — Graduate Descent

Exclusive KL is generally regarded as "an approximation" to inclusive KL. This bias in this approximation can be quite large. Inclusive vs.

How Good Are Low-Rank Approximations in Gaussian ...

bound the Kullback–Leibler divergence between an exact GP and one resulting from one of the ... tees require that the input feature vectors...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

[Enhancement] Support Different VecEnvWrapper Depths

[Bug] excessive CPU utilization