StableDiffusionPipeline producing unexpected output with MPS device using diffusers==0.4.0
See original GitHub issueDescribe the bug
I tried testing the potential speed updates of diffusers 0.4.0 on my M1 mac using an existing StableDiffusionPipeline-based script, and I found that a large image that would take ~3 min to generate in diffusers 0.3.0 was estimated to take more than 10x as long.
Since my existing script had a lot going on (e.g. large resolutions, attention slicing), I tried to diagnose the problem with a minimal script (see below), running in two identical environments, with the only difference being the diffusers version.
In diffusers 0.3.0, it takes ~35 seconds to generate a reasonable result like this:
In diffusers 0.4.0, it takes ~50 seconds (which is slower than 0.3.0, but better that the 10x performance hit I was getting before), but each attempt (with varying seeds) triggered the NSFW filter. Disabling the filter, the results appear to be just noise:
I’m not sure if the initial 10x performance hit I initially observed in my original script would be fixed by fixing this bug, but this certainly seems to be at least part of it.
Reproduction
import torch from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained(“CompVis/stable-diffusion-v1-4”) pipe.to(“mps”)
result = pipe(“dogs playing poker”, generator=torch.manual_seed(1))
result.images[0].save(“test.png”)
Logs
Under 0.4.0 there's also this warning:
/opt/homebrew/Caskroom/miniforge/base/envs/sd/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py:222: UserWarning: The operator 'aten::repeat_interleave.self_int' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
text_embeddings = text_embeddings.repeat_interleave(num_images_per_prompt, dim=0)
System Info
diffusers
version: 0.4.0- Platform: macOS-12.6-arm64-arm-64bit
- Python version: 3.10.6
- PyTorch version (GPU?): 1.13.0.dev20220911 (False)
- Huggingface_hub version: 0.10.0
- Transformers version: 4.21.3
- Using GPU in script?: MPS
- Using distributed or parallel set-up in script?: no
Issue Analytics
- State:
- Created a year ago
- Comments:16 (12 by maintainers)
Top GitHub Comments
@patrickvonplaten Not blaming you guys, just noting what was previously reported 😄 Please look at the bug report I filed about a month back (https://github.com/huggingface/diffusers/issues/548) for numbers, which I already provided.
The numbers might have changed since the 0.4.0 (and subsequent releases) but I have not done actual numbers since then. I do install the latest version every once in a while (on top of PyTorch 1.13.0.dev20220924 since after that PyTorch slows down a lot too — the numbers for that are here: https://github.com/pytorch/pytorch/issues/86048) and test when I can but it continues to be slow.
Do note, I do appreciate everything you guys are doing and am simply reporting issues as I find them since I do realize that not everybody uses MPS. I wish I could do more by helping with the code, but I’m not up to the level of working on the code at the diffusers level - doing a GUI is about all I can do at this point 😄
But if there’s anything I can do to help (since I do work with MPS day in and day out and do work on Stable Diffusion related stuff whenever I can) do let me know.
Apologies, didn’t mean to indicate that you were offended — in fact, I was explaining to indicate that I wasn’t offended/blaming you guys 😄
Regarding numbers and versions, I thought I did provide those but I do tend to be a bit verbose and so the way I provide the info might not be very clear. Please let me know what would make things clearer, or if perhaps I can provide more info and I’d be happy to do so.
I did provide the latest numbers to @pcuenca with versions on https://github.com/huggingface/diffusers/issues/548 but will reproduce here in more concise form in case it’s helpful. Please let me know if you’d prefer something like this or if I’m missing other useful info and I’d be happy to elaborate.
These tests were done yesterday to compare diffusers 0.3.0 and 0.5.1 and both used PyTorch 1.13.0.dev20220924 so as to eliminate the slowdown issues that come from the PyTorch side (PyTorch became slower on MPS only after 25-09-2022). Each test was for 50 steps generating the same prompt.
diffusers 0.5.1 512 x 512 = 57s 768 x 512 = 19 mins 17s
diffusers 0.3.0 512 x 512 = 32s 768 x 512 = 58s
Do note that with the early 0.4.0 builds (before it was released) a 512 x 512 image used to take about 80s but it seemed faster with 0.5.1 when I tested yesterday. The numbers are based off of a single run and are not averaged over several runs.
Also, I did monitor the CPU/GPU while the above runs were going on and it looks as if 0.5.1 uses both the CPU and the GPU equally with more emphasis on the CPU while 0.3.0 uses the GPU lot more (especially for the 768 x 512 image) and finishes extremely quickly.