Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

StableDiffusionPipeline producing unexpected output with MPS device using diffusers==0.4.0

See original GitHub issue

Describe the bug

I tried testing the potential speed updates of diffusers 0.4.0 on my M1 mac using an existing StableDiffusionPipeline-based script, and I found that a large image that would take ~3 min to generate in diffusers 0.3.0 was estimated to take more than 10x as long.

Since my existing script had a lot going on (e.g. large resolutions, attention slicing), I tried to diagnose the problem with a minimal script (see below), running in two identical environments, with the only difference being the diffusers version.

In diffusers 0.3.0, it takes ~35 seconds to generate a reasonable result like this: test_0 3 0

In diffusers 0.4.0, it takes ~50 seconds (which is slower than 0.3.0, but better that the 10x performance hit I was getting before), but each attempt (with varying seeds) triggered the NSFW filter. Disabling the filter, the results appear to be just noise: test_0 4 0

I’m not sure if the initial 10x performance hit I initially observed in my original script would be fixed by fixing this bug, but this certainly seems to be at least part of it.

Reproduction

import torch from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(“CompVis/stable-diffusion-v1-4”) pipe.to(“mps”)

result = pipe(“dogs playing poker”, generator=torch.manual_seed(1))

result.images[0].save(“test.png”)

Logs

Under 0.4.0 there's also this warning:

/opt/homebrew/Caskroom/miniforge/base/envs/sd/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py:222: UserWarning: The operator 'aten::repeat_interleave.self_int' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
  text_embeddings = text_embeddings.repeat_interleave(num_images_per_prompt, dim=0)

System Info

diffusers version: 0.4.0
Platform: macOS-12.6-arm64-arm-64bit
Python version: 3.10.6
PyTorch version (GPU?): 1.13.0.dev20220911 (False)
Huggingface_hub version: 0.10.0
Transformers version: 4.21.3
Using GPU in script?: MPS
Using distributed or parallel set-up in script?: no

Issue Analytics

State:
Created a year ago
Comments:16 (12 by maintainers)

Top GitHub Comments

1reaction

FahimFcommented, Oct 14, 2022

@patrickvonplaten Not blaming you guys, just noting what was previously reported 😄 Please look at the bug report I filed about a month back (https://github.com/huggingface/diffusers/issues/548) for numbers, which I already provided.

The numbers might have changed since the 0.4.0 (and subsequent releases) but I have not done actual numbers since then. I do install the latest version every once in a while (on top of PyTorch 1.13.0.dev20220924 since after that PyTorch slows down a lot too — the numbers for that are here: https://github.com/pytorch/pytorch/issues/86048) and test when I can but it continues to be slow.

Do note, I do appreciate everything you guys are doing and am simply reporting issues as I find them since I do realize that not everybody uses MPS. I wish I could do more by helping with the code, but I’m not up to the level of working on the code at the diffusers level - doing a GUI is about all I can do at this point 😄

But if there’s anything I can do to help (since I do work with MPS day in and day out and do work on Stable Diffusion related stuff whenever I can) do let me know.

0reactions

FahimFcommented, Oct 18, 2022

Apologies, didn’t mean to indicate that you were offended — in fact, I was explaining to indicate that I wasn’t offended/blaming you guys 😄

Regarding numbers and versions, I thought I did provide those but I do tend to be a bit verbose and so the way I provide the info might not be very clear. Please let me know what would make things clearer, or if perhaps I can provide more info and I’d be happy to do so.

I did provide the latest numbers to @pcuenca with versions on https://github.com/huggingface/diffusers/issues/548 but will reproduce here in more concise form in case it’s helpful. Please let me know if you’d prefer something like this or if I’m missing other useful info and I’d be happy to elaborate.

These tests were done yesterday to compare diffusers 0.3.0 and 0.5.1 and both used PyTorch 1.13.0.dev20220924 so as to eliminate the slowdown issues that come from the PyTorch side (PyTorch became slower on MPS only after 25-09-2022). Each test was for 50 steps generating the same prompt.

diffusers 0.5.1 512 x 512 = 57s 768 x 512 = 19 mins 17s

diffusers 0.3.0 512 x 512 = 32s 768 x 512 = 58s

Do note that with the early 0.4.0 builds (before it was released) a 512 x 512 image used to take about 80s but it seemed faster with 0.5.1 when I tested yesterday. The numbers are based off of a single run and are not averaged over several runs.

Also, I did monitor the CPU/GPU while the above runs were going on and it looks as if 0.5.1 uses both the CPU and the GPU equally with more emphasis on the CPU while 0.3.0 uses the GPU lot more (especially for the 768 x 512 image) and finishes extremely quickly.