[SAC] Potential bug in temperature learning
See original GitHub issueIn the paper they compute the loss as -alpha * (logp + entropy_target)
but here you implemented -log(alpha) * (logp + entorpy_target)
if I am not mistaken.
I guess it should look like:
ent_coef_loss = None
if self.ent_coef_optimizer is not None:
# Important: detach the variable from the graph
# so we don't change it with other losses
# see https://github.com/rail-berkeley/softlearning/issues/60
ent_coef = th.exp(self.log_ent_coef) # No need to detach here anyways
ent_coef_loss = -(ent_coef * (log_prob.detach() + self.target_entropy)).mean()
ent_coef_losses.append(ent_coef_loss.item())
else:
ent_coef = self.ent_coef_tensor
Of course you could detach ent_coef
afterwards in both cases, to maybe save time when computing actor_loss
and doing the respective .backward()
.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Question SAC entropy loss #801 - hill-a/stable-baselines
Hello sir, I want to ask that in below code (in sac.py) ... [SAC] Potential bug in temperature learning DLR-RM/stable-baselines3#36.
Read more >CAT-SAC: Soft Actor-Critic with Curiosity-Aware Entropy ...
Authors introduce curiosity to the target temperature such that the entropy is large in unfamiliar states, promoting exploration, and small in familiar states, ......
Read more >Stress Tolerance of Bed Bugs: A Review of Factors ... - NCBI
Temperature changes following blood feeding in insects can potentially be 10–12 °C, prompting the heat shock response to protect the midgut from heat...
Read more >Stink bugs spreading rapidly to new parts of US due to climate ...
Invasive, crop-eating, and foul-smelling stink bugs could become more widespread in the US due to climate change, a new study has warned.
Read more >Posts Tagged: stink bug - ANR News - ANR Employees
Ingels and UC Davis entomologists are studying the connection between high heat and stink bugs in the lab, where the pest is exposed...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hello, Thanks for pointing that out. There are several issues about that:
The short answer is yes, you are right, in practice, it does not change the results (the original implementation was doing that before).
EDIT: it may change something https://github.com/rail-berkeley/softlearning/issues/136#issuecomment-620533436
All right assuming that we constrain
alpha >= 0
in the last line but not in the first two lines then themin
is the same. However, you go on and usealpha
to compute the other losses. Meaning we want to know if theargmin
is the same.I believe that what’s really going on here is that the gradient of
alpha
or somealpha_logit
has the same sign depending on whetherlog_pi + H
is positive or negative and then Adam does its usual magic. Anyways, personally I would prefer to use the originally derived formula if it doesn’t degrade the performance…