Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Kl Loss correction

See original GitHub issue

https://github.com/lucidrains/DALLE-pytorch/blob/995bfe1789243cbc838943cdc748daab406aae3e/dalle_pytorch/dalle_pytorch.py#L195

I am fairly certain that this should instead read

logits = rearrange(logits, 'b n h w -> (b h w) n')

since we are summing over the latent dimension, i.e. the probs/encoder outputs and averaging over the obversations. i.e. every spatial dim separately and for every sample/batch. The docs are a little messy on this but from what I understand batchmean requires a reshaping in the sense that all examples are condensed into the batch dimension

Issue Analytics

State:
Created 3 years ago
Reactions:3
Comments:12 (6 by maintainers)

Top GitHub Comments

2reactions

CDitzelcommented, Mar 20, 2021

I have no idea what going on with OpenAI, but so far I dont think they are open to transparent research as their name would suggest…

1reaction

CDitzelcommented, Mar 17, 2021

I have a headache Phil. The math demands this term to be there but when it is present, the results are actually worse. Hate it…

Read more comments on GitHub >

Top Results From Across the Web

How to Calculate the KL Divergence for Machine Learning

Nevertheless, we can calculate the KL divergence using the rel_entr() SciPy function and confirm that our manual calculation is correct.

Kullback–Leibler divergence - Wikipedia

A simple interpretation of the KL divergence of P from Q is the expected excess surprise from using Q as a model when...

Kullback-Leibler Divergence Explained - Count Bayesie

With KL divergence we can calculate exactly how much information is lost when we approximate one distribution with another.

Kullback-Leibler Divergence - an overview - ScienceDirect.com

If the loss function yields more value, it means the decoder does not generate ... DKL is a positive quantity and is equal...

What is the impact of scaling the KL divergence and ...

Variational autoencoders have two components in their loss function. The first component is the reconstruction ...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

Resuming uses more VRAM than starting from scratch with same parameters

(Bug) Using 'conv_like' attention causes loss to nosedive too quickly