Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] entropy sign, once again

See original GitHub issue

Question

I am confused with the entropy. The definition is $\text{entropy} = - \sum_i p_i \log(p_i) $, but the code reads:

                # Entropy loss favor exploration
                if entropy is None:
                    # Approximate entropy when no analytical form
                    entropy_loss = -th.mean(-log_prob)
                else:
                    entropy_loss = -th.mean(entropy)

and it gives the opposite result (in terms of concavity) than using entropy as in the definition (having the factor p_i before log(p_i)). I haven’t read any papers, but I think the definition of entropy is quite well established. See different curve shapes

wikipedia definition shape ignoring p_i shape

what am I missing?

Checklist

[x ] I have read the documentation (required)
[ x] I have checked that there is ~no~ similar issue in the repo (required)

Issue Analytics

State:
Created 2 years ago
Comments:5 (1 by maintainers)

Top GitHub Comments

2reactions

araffincommented, Aug 23, 2021

As a side remark, we have a test that ensure this estimate is not too bad: https://github.com/DLR-RM/stable-baselines3/blob/master/tests/test_distributions.py#L79 (I think I took the idea from https://github.com/openai/baselines/blob/master/baselines/common/distributions.py#L323)

2reactions

externalsupplierstaffcommented, Aug 22, 2021

Thanks very much for your detailed answers! Indeed, I missed that in the code we were using the entropy for each distribution, which makes much more sense, I thought we were using that estimate for my categorical distribution. That was my confusion.

Regarding the current estimate… I don’t know sac nor the area mentioned on the math exchange site. Definitely agree with your analysis of the current approach. I’m happy and will close this issue, but maybe a notimplementederror, a link to this issue or something else could be nice, as this is a bit obscure (I’m not certain yet that the estimator is working in the right direction, I think that the curve is inverted if we assume p=1) feel free to open again to discuss the point between parenthesis