Add inference support for `mps` device (Apple Silicon)
See original GitHub issueThere is a lot of community interest in running diffusers on Apple Silicon. A first step could be to introduce support for the mps
device, which currently requires PyTorch nightly builds. Another step down the line could be to convert to Core ML and optimize to make sure that the use of the ANE (Neural Engine) is maximized.
This PR deals with the first approach.
Describe the solution you’d like The following should work:
pipe = StableDiffusionPipeline.from_pretrained(model_id, use_auth_token=True)
pipe = pipe.to("mps")
# Rest of inference code remains the same
pipe.to
would determine whether mps
is available in the computer, and would raise an error if it’s not.
One consideration is that perhaps not all the components have to be moved to the mps
device. Perhaps it’s more efficient or practical to keep the text encoder in CPU, for instance. If so, to()
would move the required components and the pipelines would need to be adapted to move the tensors transparently.
Describe alternatives you’ve considered Conversion to Core ML, as mentioned above.
Additional context
I have tested the unet
module in mps
vs cpu
, and these are some preliminary results in my computer (M1 Max with 64 GB of unified RAM), when iterating through the 51 default steps of Stable Diffusion with the default scheduler and no classifier-free guidance:
Device: cpu, torch: 1.12.1, time: 92.09s
Device: cpu, torch: 1.13.0.dev20220830, time: 92.12s
Device: mps:0, torch: 1.13.0.dev20220830, time: 17.20s
Execution in mps
complains about one operation having to be performed in CPU:
The operator 'aten::masked_select' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
We need to investigate the reason and whether an alternative would yield better performance.
Issue Analytics
- State:
- Created a year ago
- Reactions:3
- Comments:5 (2 by maintainers)
Top GitHub Comments
Regarding the alternative, i.e. converting the model to CoreML:
@madebyollin made gist that can convert the U-net to a CoreML model, which can be injected into the pipeline, or used as a standalone model in a native app. However, it is not a complete model as this library (
diffusers
) is doing an extra work around the model.Here’s everything I’ve learned from the above linked resources as a Jupyter notebook that can be run in VSCode on MacOS: StableDiffusion-MPS.ipynb.
When running the pipeline I’ve confirmed using Activity Monitor that the Python process is using the GPU.