Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DDIM Sampler on Stable-Diffusion do not work well with CFG guidance scale large than 6~7

See original GitHub issue

Describe the bug

I have noticed that the SD v1.4 or v1.5 works poorly if I swap the scheduler to DDIM and have guidance scale larger than 7. This behavior does not seems to be obvious on other sampler such as the default PNDM scheduler though. At first I suspect it is related to the “train-test-mismatch” mentioned in Imagen paper Sec 2.3. However, I found that that the same DDIM sampler in WebUI does not suffer from the same performance degradation with the same guidance scale. I have manually traced the scales of the predicted epsilon value under the same prompt/guidance scale in both WebUI DDIM sampler and diffuser DDIM sampler, they are all within simialr range of [-4, 4]. I have also printed out the beta schedules of both but they are very very close( of course I tried replacing diffusers DDIM betas with WebUI’s DDIM betas but it does not help ) Screenshot 2022-12-08 at 15 08 04

Reproduction

Reproduce is easy, and the behavior of following code snippet is consistent across diffusers versions from 0.3 to 0.9

from diffusers import StableDiffusionPipeline, DDIMScheduler
# swap any SD model here
pipe = StableDiffusionPipeline.from_pretrained('/xxxx/stable-diffusion-v1-5').to(0)
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
prompt = "A photograph of an astronaut riding a horse on Mars, high resolution, high definition."
# change the guidance scale from 1 - 15 and observe the performance degradation
image = pipe(prompt=prompt, num_inference_steps=100, guidance_scale=10., strength=1.).images[0]

guidance_scale = 2.5 Screenshot 2022-12-08 at 15 19 25

guidance_scale = 5.0 Screenshot 2022-12-08 at 15 20 10

guidance_scale = 7.5 Screenshot 2022-12-08 at 15 22 16

guidance_scale = 10. Screenshot 2022-12-08 at 15 21 10

Logs

No response

System Info

Both diffusers == 0.3.0 and diffusers == 0.9.0 suffer from such issue Also SD v1-4 and v1-5 suffer from such issue

Issue Analytics

State:
Created 9 months ago
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

Randolph-zengcommented, Dec 13, 2022

@patrickvonplaten OMG!! Yes this is exactly the reason why it is failing !!! Thanks a lot for finding it out. This bug really tortures me for a week ! I checked everything but just did not check the clipping and that’s why the DDIM is failing because its normal range is [-4,4] and clipping makes all the guidance signal go away! Thanks a lot for finding it out, I will close this issue now.

0reactions

patrickvonplatencommented, Dec 12, 2022

Here the PR that updates the conversion script: https://github.com/huggingface/diffusers/pull/1667

Top Results From Across the Web

How do I run Stable Diffusion and sharing FAQs - Reddit

A remote computer by google with powerful GPU for calculation. There is a free version (not sure it is powerful enough to run...

CompVis/stable-diffusion: A latent text-to-image ... - GitHub

Stable Diffusion is a latent diffusion model conditioned on the (non-pooled) text embeddings of a CLIP ViT-L/14 text encoder. We provide a reference...

Deforum Settings Explained - Part 1 Stable Diffusion ...

In this Video I will explain the Deforum Settings for Video Rendering with Stable Diffusion. We will look at the Render Settings, Sampling, ......

Update to diffusers backend · stabilityai/stable ... - Hugging Face

from ldm.models.diffusion.ddim import DDIMSampler ... 67. 68. - #When running locally, you won`t have access to this, so you can remove this ...

Deforum Stable Diffusion Settings - Dreaming Computers

Deforum Stable Diffusion is a Google Notebook which leverages an AI Image generating technique called Latent Diffusion to allow you to create compelling...