question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Why need unnormalize img when apply openai's clip `embed_image`

See original GitHub issue

Hi Phil, I doubt whether need translate image from (-1,1) -> (0,1) here. Since the image parsed in is from the original dataset(PIL), and its range is (0,1), I think here should avoid using unnormalize() https://github.com/lucidrains/DALLE2-pytorch/blob/2db0c9794c33e98df25b84f557a683a8900dfc61/dalle2_pytorch/dalle2_pytorch.py#L281

When I do an experiment on decoder training, the sampled image from Decoder turns out too bright (the whole pixel value shifts to a higher range)

Best,

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
CiaoHecommented, May 14, 2022

wow what a speed! I just took a snap haha

Yeah, do normalize / unnormalize within the decoder class is enough (operate in p_sample_loop and p_losses)

1reaction
lucidrainscommented, May 14, 2022

ok all done in the latest! now nobody has to worry about this normalization / inverse normalization business 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Understanding CLIP by OpenAI
The CLIP model consists of a text and an image encoder which encodes textual and visual information into a multimodal embedding space.
Read more >
Hands-on Guide to OpenAI's CLIP - Connecting Text To ...
CLIP is an extension of that. It provides predictions with captions on images based on simple pre-trained models in a more robust and...
Read more >
What is OpenAI's CLIP and how to use it?
Because CLIP doesn't need to be trained on specific phrases, it's perfectly suited for searching large catalogs of images. It doesn't need ...
Read more >
CLIP
CLIP is a multi-modal vision and language model. It can be used for image-text similarity and for zero-shot image classification. CLIP uses a...
Read more >
OpenAI CLIP: ConnectingText and Images (Paper Explained)
ai # openai #technologyPaper Title: Learning Transferable Visual Models From Natural Language SupervisionCLIP trains on 400 million images ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found