question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

IN PPO, clipping the value loss with max is OK?

See original GitHub issue

in file ‘pposgd_simple.py’ line 117,

vf_loss = .5 * U.mean(tf.maximum(vfloss1, vfloss2)) # we do the same clipping-based trust region for the value function

why not tf.minimum ?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:8

github_iconTop GitHub Comments

4reactions
lezhang-thucommented, Aug 31, 2020

code from ppo2

        # Clip the value to reduce variability during Critic training
        # Get the predicted value
        vpred = train_model.vf
        vpredclipped = OLDVPRED + tf.clip_by_value(train_model.vf - OLDVPRED, - CLIPRANGE, CLIPRANGE)
        # Unclipped value
        vf_losses1 = tf.square(vpred - R)
        # Clipped value
        vf_losses2 = tf.square(vpredclipped - R)

        vf_loss = .5 * tf.reduce_mean(tf.maximum(vf_losses1, vf_losses2))

ppo alg. is sort of like trpo. both optimize the obj under the condition that the optimization is within a trust region. i just want to describe one case so you can picture it. consider the case when OLDVPRED < R. now we have to update train_model.vf, i.e. vpred, so vpred is closer to R after the update. case 1 is if vpred is within the trust region of OLDVPRED, then nothing needs to be done, tf.maximum(vf_losses1, vf_losses2) would degenerate to tf.square(vpred - R). case 2 if vpred is outside trust region of OLDVPRED. then there would be case 2.1 and case 2.2. case 2.1 is for the case when after the update vpred is closer to the trust region, and also closer to R. case 2.1 is a perfect case, as we need to update vpred to optimize the objective of closing to R, also to be closer to trust region. so for case 2.1, it happens when train_model.vf < OLDVPRED - CLIPRANGE. and now you’ll see why tf.maximum(vf_losses1, vf_losses2) is needed here, as we want to keep tf.square(vpred - R). case 2.2 is subtle. this is the case when after the update vpred is going away farther from the trust region, and also closer to R. for case 2.2 we cannot update, as that would disobey the spirit of trust region, i.e., all the updates should be done within the trust region or for the “within the trust region” condition to be more possible. so for case 2.2, it happens for example OLDVPRED + CLIPRANGE < train_model.vf < R. this time, tf.maximum then would choose vf_losses2. this is literally a constant, hence no grad, and no update ever happens.

hope this explanation helps. for anyone needed it.

0reactions
yueyang130commented, Nov 1, 2021

@lezhang-thu very clear and logical explanation!thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

PPO Hyperparameters and Ranges - Medium
Clip parameter illustration from Schulman et al ... Explanation for the Value Function loss (2nd term) from the PPO paper:.
Read more >
Decaying Clipping Range in Proximal Policy Optimization - arXiv
This simple yet powerful idea prevents large policy updates during optimization.
Read more >
Proximal Policy Optimization Tutorial (Part 2/2: GAE and PPO ...
The value of epsilon is suggested to be kept at 0.2 in the paper. Critic loss is nothing but the usual mean squared...
Read more >
RL - Policy Proximal Optimization and clipping - Cross Validated
Essentially, we look to increase the likelihood of an action, at, if the advantage function, At>0 and we clip the value of the...
Read more >
Clipped Proximal Policy Optimization - GitHub Pages
Very similar to PPO, with several small (but very simplifying) changes: Train both the value and policy networks, simultaneously, by defining a single...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found