Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SAC] Potential bug in temperature learning

See original GitHub issue

https://github.com/DLR-RM/stable-baselines3/blob/78e8d405d7bf6186c8529ed26967cb17ccbe420c/stable_baselines3/sac/sac.py#L184

In the paper they compute the loss as -alpha * (logp + entropy_target) but here you implemented -log(alpha) * (logp + entorpy_target) if I am not mistaken.

I guess it should look like:

            ent_coef_loss = None
            if self.ent_coef_optimizer is not None:
                # Important: detach the variable from the graph
                # so we don't change it with other losses
                # see https://github.com/rail-berkeley/softlearning/issues/60
                ent_coef = th.exp(self.log_ent_coef) # No need to detach here anyways
                ent_coef_loss = -(ent_coef * (log_prob.detach() + self.target_entropy)).mean()
                ent_coef_losses.append(ent_coef_loss.item())
            else:
                ent_coef = self.ent_coef_tensor

Of course you could detach ent_coef afterwards in both cases, to maybe save time when computing actor_loss and doing the respective .backward().

Issue Analytics

State:
Created 3 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

araffincommented, May 27, 2020

Hello, Thanks for pointing that out. There are several issues about that:

The short answer is yes, you are right, in practice, it does not change the results (the original implementation was doing that before).

EDIT: it may change something https://github.com/rail-berkeley/softlearning/issues/136#issuecomment-620533436

0reactions

johannespitzcommented, May 27, 2020

All right assuming that we constrain alpha >= 0 in the last line but not in the first two lines then the min is the same. However, you go on and use alpha to compute the other losses. Meaning we want to know if the argmin is the same.

min E[-alpha * log_pi - alpha * H]
= min E[-alpha * (log_pi + H)]
= min E[-log alpha * (log_pi + H)]

I believe that what’s really going on here is that the gradient of alpha or some alpha_logit has the same sign depending on whether log_pi + H is positive or negative and then Adam does its usual magic. Anyways, personally I would prefer to use the originally derived formula if it doesn’t degrade the performance…

Top Results From Across the Web

Question SAC entropy loss #801 - hill-a/stable-baselines

Hello sir, I want to ask that in below code (in sac.py) ... [SAC] Potential bug in temperature learning DLR-RM/stable-baselines3#36.

CAT-SAC: Soft Actor-Critic with Curiosity-Aware Entropy ...

Authors introduce curiosity to the target temperature such that the entropy is large in unfamiliar states, promoting exploration, and small in familiar states, ......

Stress Tolerance of Bed Bugs: A Review of Factors ... - NCBI

Temperature changes following blood feeding in insects can potentially be 10–12 °C, prompting the heat shock response to protect the midgut from heat...

Stink bugs spreading rapidly to new parts of US due to climate ...

Invasive, crop-eating, and foul-smelling stink bugs could become more widespread in the US due to climate change, a new study has warned.

Posts Tagged: stink bug - ANR News - ANR Employees

Ingels and UC Davis entomologists are studying the connection between high heat and stink bugs in the lab, where the pest is exposed...