Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Why need unnormalize img when apply openai's clip `embed_image`

See original GitHub issue

Hi Phil, I doubt whether need translate image from (-1,1) -> (0,1) here. Since the image parsed in is from the original dataset(PIL), and its range is (0,1), I think here should avoid using unnormalize() https://github.com/lucidrains/DALLE2-pytorch/blob/2db0c9794c33e98df25b84f557a683a8900dfc61/dalle2_pytorch/dalle2_pytorch.py#L281

When I do an experiment on decoder training, the sampled image from Decoder turns out too bright (the whole pixel value shifts to a higher range)

Best,

Issue Analytics

State:
Created a year ago
Comments:6 (5 by maintainers)

Top GitHub Comments

1reaction

CiaoHecommented, May 14, 2022

wow what a speed! I just took a snap haha

Yeah, do normalize / unnormalize within the decoder class is enough (operate in p_sample_loop and p_losses)

1reaction

lucidrainscommented, May 14, 2022

ok all done in the latest! now nobody has to worry about this normalization / inverse normalization business 😃

Read more comments on GitHub >

Top Results From Across the Web

Understanding CLIP by OpenAI

The CLIP model consists of a text and an image encoder which encodes textual and visual information into a multimodal embedding space.

Hands-on Guide to OpenAI's CLIP - Connecting Text To ...

CLIP is an extension of that. It provides predictions with captions on images based on simple pre-trained models in a more robust and...

What is OpenAI's CLIP and how to use it?

Because CLIP doesn't need to be trained on specific phrases, it's perfectly suited for searching large catalogs of images. It doesn't need ...

CLIP

CLIP is a multi-modal vision and language model. It can be used for image-text similarity and for zero-shot image classification. CLIP uses a...

OpenAI CLIP: ConnectingText and Images (Paper Explained)

ai # openai #technologyPaper Title: Learning Transferable Visual Models From Natural Language SupervisionCLIP trains on 400 million images ...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

Decoder AMP training loss blow up

EMA Bug