question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Regarding learned image embedding and text embedding in Unet

See original GitHub issue

According to the paper Section 2.1 Decoder, it says

We enable classifier-free guidance by randomly setting CLIP embeddings to zero (or a learned embedding) 10% of the time, and randomly dropping the text caption 50% of he time during training.

It seems that we are replacing the embeddings after turning them to condition sequences.

https://github.com/lucidrains/DALLE2-pytorch/blob/main/dalle2_pytorch/dalle2_pytorch.py#L1216-L1222 https://github.com/lucidrains/DALLE2-pytorch/blob/main/dalle2_pytorch/dalle2_pytorch.py#L1229-L1234

And from the following it seems that that null text embeddings can vary according to their sequence position. For image embeddings, I feel it is fine, but what about for text encodings?

https://github.com/lucidrains/DALLE2-pytorch/blob/main/dalle2_pytorch/dalle2_pytorch.py#L1104

Also, it seems perhaps it is needed to have a separate a separate cond_drop_prob one for image embedding and one for text encodings. If we do that, how do we modify forward_with_cond_scale()?

https://github.com/lucidrains/DALLE2-pytorch/blob/main/dalle2_pytorch/dalle2_pytorch.py#L1166-L1178

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
xiankgxcommented, Apr 30, 2022
0reactions
lucidrainscommented, Apr 30, 2022

@xiankgx haha, actually there was another issue with the null padding tokens, only uncovered because of your issues https://github.com/lucidrains/DALLE2-pytorch/commit/1c1e508369da34eb35741558d33203f42fea006e should be ok now

keep it coming! 🙏

Read more comments on GitHub >

github_iconTop Results From Across the Web

A neural architecture to learn image-text joint embedding
In this project, we build and study two-branch neural architectures to learn image-text joint embedding for the image-sentence retrieval task i.e given an ......
Read more >
Deep Learning #4: Why You Need to Start Using Embedding ...
And how there's more to it than word embeddings. This post is part of a series on deep learning. Checkout the other parts...
Read more >
Learning Deep Structure-Preserving Image-Text Embeddings
This paper proposes a method for learning joint embed- dings of images and text using a two-branch neural net- work with multiple layers...
Read more >
a unified image embedding for classes and instances - arXiv
Given a batch of images, they re-normalize their embeddings to the unit sphere, sample negative pairs as a function of the embedding similarity ......
Read more >
Embeddings | Machine Learning - Google Developers
Embeddings make it easier to do machine learning on large inputs like sparse vectors representing words. Ideally, an embedding captures some of ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found