question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

More "OpenAI Blog Post" Training | Depth 32 | Heads 8 | LR 5e-4

See original GitHub issue

Edit: Moved to discussions: https://github.com/lucidrains/DALLE-pytorch/discussions/106

Hey, all. Some of you might know I’m practicing and learning about machine learning with dalle-pytorch and a dataset consisting of the images OpenAI presented in the DALLE blog post. I honestly dont have the money to train this whole dataset,

edit: this is no longer true. Using the 1024 VQGAN from the “Taming Transformers” research, it’s now quite possible to train a full dataset of 1,000,000 image-text pairs and i’m doing just that. I hope to have it finished in about a week. I assume someone else will release a dalle-pytorch trained properly on COCO and other image sets before then, but if they dont, check here for updates.

Anway, it ran for ~36000 steps. As you can see it…still really likes mannequins. I’m considering removing them from the dataset. But also, you’ll notice that the network has actually got a decent idea of the sort of general colors that belong in types of prompts.

Some Samples from Near the End of Training

results

Every Text-Image Reconstruction

https://wandb.ai/afiaka87/dalle_pytorch_live_training/reports/dalle-pytorch-Test-Run-2--Vmlldzo1MzM5MjQ

Deliverables (my train_dalle.py)

https://gist.github.com/afiaka87/850fb3cc48edde8a7ed4cb1ce53b6bd2

This has some code in it that actually manages to deal with truncated images via Try Catch. Apparently detecting a corrupted PNG is harder than P vs NP. PIL’s imverify() function doesnt catch all of them. Python’s built in imghdr library doesn’t catch all of them either. So you just sort of catch OSError and return an item further along. Works well enough.

Parameters

SHUFFLE = True
EPOCHS = 28 # This wound up being less than a single epoch, of course. 
BATCH_SIZE = 16
LEARNING_RATE = 0.0005 # I found this learning rate to be more suitable than 0.0003 in my hyperparameter sweep post
GRAD_CLIP_NORM = 0.5
DEPTH = 32
HEADS = 8
MODEL_DIM = 512
TEXT_SEQ_LEN = 256
DIM_HEAD = 64
REVERSIBLE = True,
ATTN_TYPES = ('full')

Dataset Description

https://github.com/lucidrains/DALLE-pytorch/issues/61#issuecomment-796663342

Just for more info on the dataset itself, it is roughly 1,100,000 256x256 image-text pairs that were generated by OpenAI’s DALL-E. They presented roughly ~30k unique text prompts of which they posted the top 32 of 512 generations on https://openai.com/blog/dall-e/. Many images were corrupt, and not every prompt has a full 32 examples, but the total number of images winds up being about 1.1 million. If you look at many of the examples on that page, you’ll see that DALL-E (in that form at least), can and will make mistakes. These mistakes are also in this dataset. Anyway I’m just messing around having fun training and what not. This is definitely not going to produce a good model or anything.

There are also a large number of images in the dataset which are intended to be used with the “mask” feature. I don’t know if that’s possible yet in DALLE-pytorch though. Anyway, that can’t be helping much.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:17
  • Comments:31 (28 by maintainers)

github_iconTop GitHub Comments

3reactions
afiaka87commented, Mar 17, 2021

On the other hand, we might have a better / faster transformer with 1024 VQGAN, which might speed up things a little bit.

@robvanvolt Here’s some early results from training on that dataset by the way. I think we should definitely clean it up with the info from OpenAI. https://wandb.ai/afiaka87/OpenImagesV6/reports/dalle-pytorch-OpenImagesV6-With-Localized-Annotations---Vmlldzo1MzgyMTU

After about ~15k iters, I stopped training, added the COCO2018 dataset and resumed from there for another ~6K steps. https://wandb.ai/afiaka87/OpenImagesV6/reports/OpenImagesV6-COCO--Vmlldzo1MzgyNTI

@lucidrains @Jinglei5

2reactions
afiaka87commented, Mar 17, 2021

@Jinglei5 also currently in the (very lengthy) process of converting all of these to 256px jpegs so I can actually move them around a bit. Do you have an existing workflow for that? Right now I’m just using imagemagick convert in a for loop.

Sorry, I don’t have the workflow. I just sampled 10,000 of them to feed the model directly for a trial right now. ><

Ha I do that as well. It is insane to me the number of things that just straight up break when you’re dealing with lots of files.

It’s all good though, I managed to figure it out:

find . -type f -name "*.jpg" | parallel mogrify -resize 256x {}

Read more comments on GitHub >

github_iconTop Results From Across the Web

OpenAI Blog
Techniques for Training Large Neural Networks ... Lessons Learned on Language Model Safety and Misuse ... September 8, 2021. — Announcements ...
Read more >
ChatGPT: Optimizing Language Models for Dialogue - OpenAI
We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer ...
Read more >
Aligning Language Models to Follow Instructions - OpenAI
We've trained language models that are much better at following user intentions than GPT-3 while also making them more truthful and less ...
Read more >
CLIP: Connecting Text and Images - OpenAI
We're introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision.
Read more >
Better Language Models and Their Implications - OpenAI
On language tasks like question answering, reading comprehension, summarization, and translation, GPT-2 begins to learn these tasks from the raw ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found