The bad generations of `generate.py`.
See original GitHub issueThanks for the repo.
I trained the DALLE on visual genome dataset. During the training, one of the generations is shown below,
But when I want to generate an image by generate.py
, the generated images are non-sense, even though I use the text also appeared in the training phase.
The scripts I use
# separate the punctuation marks with characters
python generate.py --dalle_path ./dalle.pt --text "tire on bus . window on bus . window on bus . window on bus . window on bus . pole in grass . window on bus .
and
# don't separate the punctuation marks with characters
python generate.py --dalle_path ./dalle.pt --text "tire on bus. window on bus. window on bus. window on bus. window on bus. pole in grass. window on bus."
The results of both scripts are similar, and the generations are
I have checked the model weights are loaded normally. Any thoughts on this issue?
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:10 (10 by maintainers)
Top Results From Across the Web
Code generation is terrible and great | by Anders Hovmöller
It's based on Python which is our back end language where we spend most of our time. You want the generator to be...
Read more >How to Use Generators and yield in Python
In this step-by-step tutorial, you'll learn about generators and yielding in Python. You'll create generator functions and generator expressions using ...
Read more >Generators - Python Wiki
The performance improvement from the use of generators is the result of the lazy (on demand) generation of values, which translates to lower ......
Read more >code generation - Python generating Python - Stack Overflow
I've done the same thing (well, generating java from python in this case) using str.format. - while it feels a bit wrong on...
Read more >What you can generate and how - Hypothesis! - Read the Docs
To support this principle Hypothesis provides strategies for most built-in types with arguments to constrain or adjust the output, as well as higher-order ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@afiaka87 It turns out it’s because the approach we save images is different in
train_dalle.py
andgenerate.py
.In
train_dalle.py
we usewandb.Image
to process the image, and it will automatically normalize and scale the image. https://github.com/wandb/client/blob/9cc04578ebc6d593450e9dbbcae07452bf7bec35/wandb/sdk/data_types.py#L1676-L1679However, in
generate.py
, we usetorchvision.utils.save_image
to do it. It won’t normalize the image to (0, 1) unless we specify the argumentnormalize=True
. Also, the VAE’s output range is roughly within -1 and 1, if we don’t normalize the image,save_image
will directly transform the float number to uint8 byHence, there will be lots of pixels, which are original smaller than 0, become 0. That’s why there is a big part of the image is black.
There are some results, the config I use is
And the outputs of
generate.py
, given text “frame on wall. lamp by bed. wall on building.”, areAfter I add
normalize=True
insave_image(image, outputs_dir / f'{i}.jpg', normalize=True)
, the outputs areLooks much better.
BTW, the
mask
seems also important for generations.output = dalle.generate_images(text_chunk, mask = mask, filter_thres = args.top_k)
. The above results are generated by inputting mask. But I haven’t dig into this too much.Edit Generations without masks (original implementation):
The results are quite weird, and I cannot even align the image with the text.
Generations with masks:
We can see some beds and lamps in the images, so the quality is higher than which without masks. It seems that the
pad
token influences the results a lot, so we need to use a mask to exclude them.This is great work and an obvious opportunity to submit a pull request if you’d like. @louis2889184