question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

The bad generations of `generate.py`.

See original GitHub issue

Thanks for the repo.

I trained the DALLE on visual genome dataset. During the training, one of the generations is shown below,

image image

But when I want to generate an image by generate.py, the generated images are non-sense, even though I use the text also appeared in the training phase.

The scripts I use

# separate the punctuation marks with characters
python generate.py --dalle_path ./dalle.pt --text "tire on bus . window on bus . window on bus . window on bus . window on bus . pole in grass . window on bus .

and

# don't separate the punctuation marks with characters
python generate.py --dalle_path ./dalle.pt --text "tire on bus. window on bus. window on bus. window on bus. window on bus. pole in grass. window on bus."

The results of both scripts are similar, and the generations are

image image image

I have checked the model weights are loaded normally. Any thoughts on this issue?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:10 (10 by maintainers)

github_iconTop GitHub Comments

3reactions
ylsungcommented, Mar 27, 2021

@afiaka87 It turns out it’s because the approach we save images is different in train_dalle.py and generate.py.

In train_dalle.py we use wandb.Image to process the image, and it will automatically normalize and scale the image. https://github.com/wandb/client/blob/9cc04578ebc6d593450e9dbbcae07452bf7bec35/wandb/sdk/data_types.py#L1676-L1679

However, in generate.py, we use torchvision.utils.save_image to do it. It won’t normalize the image to (0, 1) unless we specify the argument normalize=True. Also, the VAE’s output range is roughly within -1 and 1, if we don’t normalize the image, save_image will directly transform the float number to uint8 by

ndarr = grid.mul(255).add_(0.5).clamp_(0, 255).permute(1, 2, 0).to('cpu', torch.uint8).numpy()

Hence, there will be lots of pixels, which are original smaller than 0, become 0. That’s why there is a big part of the image is black.

There are some results, the config I use is

EPOCHS = 20
BATCH_SIZE = 8
LEARNING_RATE = 3e-4
GRAD_CLIP_NORM = 0.5

MODEL_DIM = 256 # 512
TEXT_SEQ_LEN = 64 # 256
DEPTH = 32
HEADS = 16
DIM_HEAD = 64
REVERSIBLE = False
ATTN_TYPES = None

And the outputs of generate.py, given text “frame on wall. lamp by bed. wall on building.”, are 0 1 2 3 4

After I add normalize=True in save_image(image, outputs_dir / f'{i}.jpg', normalize=True), the outputs are 0 1 2 3 4

Looks much better.

BTW, the mask seems also important for generations. output = dalle.generate_images(text_chunk, mask = mask, filter_thres = args.top_k). The above results are generated by inputting mask. But I haven’t dig into this too much.


Edit Generations without masks (original implementation): 0 1 2 3 4

The results are quite weird, and I cannot even align the image with the text.

Generations with masks: 0 1 2 3 4

We can see some beds and lamps in the images, so the quality is higher than which without masks. It seems that the pad token influences the results a lot, so we need to use a mask to exclude them.

2reactions
afiaka87commented, Mar 28, 2021

This is great work and an obvious opportunity to submit a pull request if you’d like. @louis2889184

Read more comments on GitHub >

github_iconTop Results From Across the Web

Code generation is terrible and great | by Anders Hovmöller
It's based on Python which is our back end language where we spend most of our time. You want the generator to be...
Read more >
How to Use Generators and yield in Python
In this step-by-step tutorial, you'll learn about generators and yielding in Python. You'll create generator functions and generator expressions using ...
Read more >
Generators - Python Wiki
The performance improvement from the use of generators is the result of the lazy (on demand) generation of values, which translates to lower ......
Read more >
code generation - Python generating Python - Stack Overflow
I've done the same thing (well, generating java from python in this case) using str.format. - while it feels a bit wrong on...
Read more >
What you can generate and how - Hypothesis! - Read the Docs
To support this principle Hypothesis provides strategies for most built-in types with arguments to constrain or adjust the output, as well as higher-order ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found