question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Why does train_text_to_image.py perform so differently from the CompVis script?

See original GitHub issue

I posted about this on the forum but didn’t get any useful feedback - would love to hear from someone who knows the in and outs of the diffusers codebase!

https://discuss.huggingface.co/t/discrepancies-between-compvis-and-diffuser-fine-tuning/25556

To summarize the post: the train_text_to_image.py script and original CompVis repo perform very differently when fine-tuning on the same dataset with the same hyperparameters. I’m trying to reproduce the Lamda Labs Pokemon fine-tuning results and finding difficulty doing so (picture results in forum post).

I’ve been digging into the implementations and I’m not noticing any obvious differences in how the models are trained, losses are calculated, etc - so what explains the large behavioral discrepancies?

Would really appreciate any insight on what might be causing this.

Issue Analytics

  • State:open
  • Created 10 months ago
  • Comments:12 (8 by maintainers)

github_iconTop GitHub Comments

3reactions
patil-surajcommented, Nov 8, 2022

Thanks for posting the detailed issue @john-sungjin !

As you said, the implementation is very similar to the compvis one. The one difference that I’m aware of is that, in the compvis script, for example the Pokemon fine-tuning script, the model is initialised from the sd-v1-4-full-ema.ckpt checkpoint, so it loads the non-ema weights for training and ema weights for doing ema. While in diffusers script the ema checkpoint is used for both training and EMA.

I am going to add an option which enables loading both the non-ema (for training) and ema (for EMA updates) in diffusers script and then compare again. Will report here as soon as possible 😃

1reaction
patil-surajcommented, Nov 21, 2022

Going to update the script soon, I am getting good results with script now, see for example the emoji model

Read more comments on GitHub >

github_iconTop Results From Across the Web

`txt2img.py` killed upon start · Issue #352 · CompVis/stable ...
I'm running a RTX 3090 with a Ryzen 5 3600 16gb RAM running on ARCH. Everytime I try to run python scripts/txt2img.py --prompt...
Read more >
Discrepancies between CompVis and Diffuser fine-tuning?
I'm finding that the results are drastically different; with the same hyperparameters (LR, batch size, gradient accum, etc), the Diffusers script is ...
Read more >
How to Run Stable Diffusion Locally to Generate Images
This article will show you how to install and run Stable Diffusion, both on GPU and CPU, so you can get started generating...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found