Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Community] Testing Stable Diffusion is hard 🥵

See original GitHub issue

It’s really difficult to test stable diffusion due to the following:

1. Continous output: Diffusion models take float values as input and output float values. This is different from NLP models which tend to take int64 as inputs and int64 as outputs.
1. Output dimensions are huge. If an image has a output size of (1, 512, 512, 3) this means that there are 512 * 512 * 3 ~ 800,000 values that need to be within a given range. Say if you want to test for a max difference of (pred - ref).abs() < 1e-3 we have roughly a million values where this has to hold true. This is quite different in NLP where we rather test things like text generation or final logit layers which usually aren’t bigger then a dozen or so tensors of size 768 or 1024.
1. Error propagation: We cannot simple test one forward pass for stable diffusion because in practice people use 50 forward passes. Error propagation becomes a real problem in this case. This again is different from say generation in NLP because in generation at every generation step errors can be somewhat “smoothed” out since a “argmax” of “softmax” operation is used after each step
1. Composite systems: Stable Diffusion has three main components for inference: A Unet, a scheduler and a VAE decoder. The UNet and Scheduler are very entangled during the forward pass. Just because we know the forward pass of both the scheduler and unet work independently, it doesn’t mean that using them together works.

=> Therefore, we need to do full integration tests, meaning we need to make sure that the output of a full denoising process stays within a given error range. At the moment, we’re having quite some problems though to get full reproducible of results on different GPUs, CUDA versions etc… (especially for FP16).

That being said, it is extremely important to test stable diffusion to avoid issues like this in the future: https://github.com/huggingface/diffusers/issues/902 whereas we should still be able to improve speed with PRs like this: https://github.com/huggingface/diffusers/pull/371

At the moment, we’re running multiple integration tests for all 50 diffusion steps every time a PR is merged to master, see:

Nevertheless, the tests weren’t sufficient to detect: https://github.com/huggingface/diffusers/issues/902

Testing Puzzle 🧩: How can we find the best trade-off between fast & in-expensive test and best possible test coverage taking into account the above points?

We already looked quite a bit into: https://pytorch.org/docs/stable/notes/randomness.html

Issue Analytics

State:
Created a year ago
Reactions:2
Comments:14 (12 by maintainers)

Top GitHub Comments

1reaction

patrickvonplatencommented, Nov 15, 2022

And testing is logistically difficult for another reason that I’m rediscovering with this PR for InvokeAI: we have to do secret token management in order for tests to be able to run with the only known useful diffusion model.

and secrets management + public PRs really don’t mix well on GitHub.

I suspect you’ve worked around that problem here by self-hosting your test runners, but are there any options for other projects that would like to avoid that burden?

Are there any releases of text-to-image models under a creative commons or open source license, so we can pass them around without worrying about redistribution clauses?

They wouldn’t need to have the breadth of subjects of the big Stable Diffusion models, or as refined. Just something that works well enough that we can tell if schedulers and guidance functions and whatnot are behaving.

That’s a very good question - we also ran into this problem with @anton-l 😅

In short, for public PRs we don’t test any models that require the auth_token verification. The reason here is that Public PRs cannot have access to the GitHub secret token of our github repo which means that the PRs fail (please correct me if I’m wrong @anton-l)

when we merge to “main”, we always have access to our secret GitHub token and then can run the tests on the models.

1reaction

msaroufimcommented, Oct 23, 2022

My 2c on this, ideally you can have some fast unit tests that will cover some baseline correctness issues but for the vast majority of issues you’re highlighting you need time-consuming runs so there’s 2 things that help dramatically in my experience.

You can make benchmarks triggerable on some keyword in a PR if the PR’s focus is performance improvements https://github.com/pytorch/data/pull/740 which removes the need for community members of having to show you logs of perf improvements. The idea would be if something makes things mostly green on your benchmark suite then it should be safe to merge. The alternative is nuanced performance PRs stay open for a long time which will discourage repeat contributions.
The other problem though is some tests may be too time-consuming to run on every PR so you can run them at most once a day without spending way too much money. These kinds of tests are essential but they also change expectations because issues are detected late and may often be nuanced to debug, reverts will be a much bigger part of the workflow and that’s not fun for anyone but unfortunately necessary which means you need to actively discourage all but the most dedicated contributors to find something to do besides improving performance

Finally, a lot of these problems become easier to handle if the models are faster either by changing the models or by using training compilers so focus on speed and test time will be crucial to make this process sane.

Top Results From Across the Web

[ML-News] [Community] Testing Stable Diffusion is hard 🥵 · Issue ...

It's really difficult to test stable diffusion due to the following: Continous output: Diffusion models take float values as input and output float...

Mobile stable diffusion test - using local gpu - ip12pro max

Mobile stable diffusion test - using local gpu - ip12pro max - less than 30s a photo 🥰🥰🥰 : r/StableDiffusion.

NovelAI on Twitter: "To keep everyone on the same page ...

Greetings, NovelAI community! ... applications for the Stable Diffusion Image Model that implement very conservative NSFW filters.

AI Art Generation - Testing Stable Diffusion - YouTube

I've been testing a new AI generative text to image model called StableDiffusion. Got some more interesting content coming soon with this ...

Stable Diffusion Dreambooth Made Easy - Clone Yourself In ...

Ming follows up with a new tutorial on the latest version of dreambooth for stable diffusion image training and testing.