[Feature Request] To change the way to deal with the logarithm of standard deviation for SAC
See original GitHub issue🚀 Feature
To change the way to deal with the logarithm of standard deviation for SAC.
Motivation
in sac/policies.py line 169-171, we use
log_std = self.log_std(latent_pi)
# Original Implementation to cap the standard deviation
log_std = th.clamp(log_std, LOG_STD_MIN, LOG_STD_MAX)
However, this may lead to zero gradients when log_std is out of range due to torch.clamp
.
Alternatives
Replace code above with
log_std = torch.tanh(log_std)
log_std = LOG_STD_MIN + 0.5 * (
LOG_STD_MAX - LOG_STD_MIN
) * (log_std + 1)
as in rad line 81-84.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
[question] Actor-Net with continuous actions: Why does the std ...
In SAC, we represent the log std devs as outputs from the neural network, meaning that they depend on state in a complex...
Read more >Calculating standard deviation after log transformation
Several approaches: (i) you can estimate mean and standard deviation on both the original and the log scale as needed, in the usual...
Read more >Feature Request: Standard Deviation - New Relic Explorers Hub
I would like to have a stddev function included in NRQL.
Read more >1 Computing the Standard Deviation of Sample Means
There are two ways to compute the standard deviation σ¯x of sample means. The first way requires the knowledge of the standard deviation...
Read more >Error Analysis
Rule 5: In calculations with only one error term, you can work with standard deviation instead of variance. Rule 6: The variance of...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hello, I guess the piece of code you are showing was inspired by spinning up: https://github.com/openai/spinningup/blob/2ce0ee91826128497078fba7f25ba1d1bd9c3789/spinup/algos/sac/core.py#L66
They had that in the original release with TF, but then changed it when releasing PyTorch version: https://github.com/openai/spinningup/commit/20921137141b154454c0a2698709d9f9a0302101#diff-a626546e80a598fcbd10dd94a870a5049b1e3346234aebaa71256158f2147113
if you can show that there is actually a benefit to it, I would agree for that change, otherwise, I would keep it as is, matching the original implementation and avoiding an unnecessary breaking change.
Thank you for telling me that. Considering my own time and energy, I would abandon the proposition.