question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SAC] Potential bug in temperature learning

See original GitHub issue

https://github.com/DLR-RM/stable-baselines3/blob/78e8d405d7bf6186c8529ed26967cb17ccbe420c/stable_baselines3/sac/sac.py#L184

In the paper they compute the loss as -alpha * (logp + entropy_target) but here you implemented -log(alpha) * (logp + entorpy_target) if I am not mistaken.

I guess it should look like:

            ent_coef_loss = None
            if self.ent_coef_optimizer is not None:
                # Important: detach the variable from the graph
                # so we don't change it with other losses
                # see https://github.com/rail-berkeley/softlearning/issues/60
                ent_coef = th.exp(self.log_ent_coef) # No need to detach here anyways
                ent_coef_loss = -(ent_coef * (log_prob.detach() + self.target_entropy)).mean()
                ent_coef_losses.append(ent_coef_loss.item())
            else:
                ent_coef = self.ent_coef_tensor

Of course you could detach ent_coef afterwards in both cases, to maybe save time when computing actor_loss and doing the respective .backward().

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
araffincommented, May 27, 2020

Hello, Thanks for pointing that out. There are several issues about that:

The short answer is yes, you are right, in practice, it does not change the results (the original implementation was doing that before).

EDIT: it may change something https://github.com/rail-berkeley/softlearning/issues/136#issuecomment-620533436

0reactions
johannespitzcommented, May 27, 2020

All right assuming that we constrain alpha >= 0 in the last line but not in the first two lines then the min is the same. However, you go on and use alpha to compute the other losses. Meaning we want to know if the argmin is the same.

min E[-alpha * log_pi - alpha * H]
= min E[-alpha * (log_pi + H)]
= min E[-log alpha * (log_pi + H)]

I believe that what’s really going on here is that the gradient of alpha or some alpha_logit has the same sign depending on whether log_pi + H is positive or negative and then Adam does its usual magic. Anyways, personally I would prefer to use the originally derived formula if it doesn’t degrade the performance…

Read more comments on GitHub >

github_iconTop Results From Across the Web

Question SAC entropy loss #801 - hill-a/stable-baselines
Hello sir, I want to ask that in below code (in sac.py) ... [SAC] Potential bug in temperature learning DLR-RM/stable-baselines3#36.
Read more >
CAT-SAC: Soft Actor-Critic with Curiosity-Aware Entropy ...
Authors introduce curiosity to the target temperature such that the entropy is large in unfamiliar states, promoting exploration, and small in familiar states, ......
Read more >
Stress Tolerance of Bed Bugs: A Review of Factors ... - NCBI
Temperature changes following blood feeding in insects can potentially be 10–12 °C, prompting the heat shock response to protect the midgut from heat...
Read more >
Stink bugs spreading rapidly to new parts of US due to climate ...
Invasive, crop-eating, and foul-smelling stink bugs could become more widespread in the US due to climate change, a new study has warned.
Read more >
Posts Tagged: stink bug - ANR News - ANR Employees
Ingels and UC Davis entomologists are studying the connection between high heat and stink bugs in the lab, where the pest is exposed...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found