Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Potential regression in deterministic outputs

See original GitHub issue

Describe the bug

I’ve started noticing different outputs ~~in the latest version of diffusers~~ starting from diffusers 0.4.0 when compared against 0.3.0. This is my test code (extracted from a notebook):

import diffusers
from diffusers import StableDiffusionPipeline, DDIMScheduler
import torch
from IPython.display import display

def run_tests(pipe):
    torch.manual_seed(1000)
    display(pipe("A photo of Barack Obama smiling with a big grin").images[0])
    torch.manual_seed(1000)
    display(pipe("Labrador in the style of Vermeer").images[0])

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
pipe = pipe.to("cuda")
run_tests(pipe)

The first prompt produces identical results. The second one, however, results in different outputs:

0.3.0 labrador_0 3

main@a3efa433eac5feba842350c38a1db29244963fb5 labrador_0 6

Using DDIM, both prompts generate different images.

scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000)
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", scheduler=scheduler)
pipe = pipe.to("cuda")
run_tests(pipe)

DDIM 0.3.0 obama_ddim_0 3

DDIM main obama_ddim_0 6

DDIM 0.3.0 labrador_ddim_0 3

DDIM main labrador_ddim_0 6

In addition, there’s this post from a forum user with very different results in the img2img pipeline: https://discuss.huggingface.co/t/notable-differences-between-other-implementations-of-stable-diffusion-particularly-in-the-img2img-pipeline/24635/5. They opened another issue recently #901. Cross-referencing, may or may not be related to this issue.

Reproduction

As explained above.

Logs

No response

System Info

diffusers: main @ a3efa433eac5feba842350c38a1db29244963fb5 vs v0.3.0

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:19 (17 by maintainers)

Top GitHub Comments

2reactions

patrickvonplatencommented, Nov 30, 2022

Once the pipeline tests are fully updated we should also make a doc explaining the problem with reproducibility in general with diffusion models. cc @anton-l

2reactions

patrickvonplatencommented, Oct 31, 2022

Small update here:

1.) We now know that we cannot guarantee reproducibility (only loosely “close” reproducibility) because of https://github.com/pytorch/pytorch/issues/87992 => therefore we can never really guarantee that the exact same images are generated across devices
2.) I checked and I cannot reproduce difference of this code:

import diffusers
from diffusers import StableDiffusionPipeline, DDIMScheduler
import torch
from IPython.display import display

def run_tests(pipe):
    torch.manual_seed(1000)
    display(pipe("A photo of Barack Obama smiling with a big grin").images[0])
    torch.manual_seed(1000)
    display(pipe("Labrador in the style of Vermeer").images[0])

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
pipe = pipe.to("cuda")
run_tests(pipe)

between 0.3.0 and 0.7.0dev using a V100

3.) The aggressive unittests: https://github.com/huggingface/diffusers/blob/82d56cf192f3a3c52e0708b6c8db4a6d959244dd/tests/models/test_models_unet_2d.py#L414 all pass for 0.3.0 This is good as it means our unet is not responsible for the potential regression above

Overall this issue to me now seems much less severe than originally and a bit part of it is probably simply to “uncontrollable” randomness

Add aggressive scheduler tests and check differences between 0.3.0 and 0.7.0dev
Add aggressive minimal step pipeline tests and check differences between 0.3.0 and 0.7.0dev