Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add inference support for `mps` device (Apple Silicon)

See original GitHub issue

There is a lot of community interest in running diffusers on Apple Silicon. A first step could be to introduce support for the mps device, which currently requires PyTorch nightly builds. Another step down the line could be to convert to Core ML and optimize to make sure that the use of the ANE (Neural Engine) is maximized.

This PR deals with the first approach.

Describe the solution you’d like The following should work:

pipe = StableDiffusionPipeline.from_pretrained(model_id, use_auth_token=True)
pipe = pipe.to("mps")

# Rest of inference code remains the same

pipe.to would determine whether mps is available in the computer, and would raise an error if it’s not.

One consideration is that perhaps not all the components have to be moved to the mps device. Perhaps it’s more efficient or practical to keep the text encoder in CPU, for instance. If so, to() would move the required components and the pipelines would need to be adapted to move the tensors transparently.

Describe alternatives you’ve considered Conversion to Core ML, as mentioned above.

Additional context I have tested the unet module in mps vs cpu, and these are some preliminary results in my computer (M1 Max with 64 GB of unified RAM), when iterating through the 51 default steps of Stable Diffusion with the default scheduler and no classifier-free guidance:

Device: cpu, torch: 1.12.1,  time: 92.09s
Device: cpu, torch: 1.13.0.dev20220830,  time: 92.12s
Device: mps:0, torch: 1.13.0.dev20220830,  time: 17.20s

Execution in mps complains about one operation having to be performed in CPU:

The operator 'aten::masked_select' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)

We need to investigate the reason and whether an alternative would yield better performance.

Issue Analytics

State:
Created a year ago
Reactions:3
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

mlajtoscommented, Sep 3, 2022

Regarding the alternative, i.e. converting the model to CoreML:

@madebyollin made gist that can convert the U-net to a CoreML model, which can be injected into the pipeline, or used as a standalone model in a native app. However, it is not a complete model as this library (diffusers) is doing an extra work around the model.

1reaction

mjacommented, Sep 2, 2022

Here’s everything I’ve learned from the above linked resources as a Jupyter notebook that can be run in VSCode on MacOS: StableDiffusion-MPS.ipynb.

When running the pipeline I’ve confirmed using Activity Monitor that the Python process is using the GPU.

Top Results From Across the Web

How to use Stable Diffusion in Apple Silicon (M1/M2)

Diffusers is compatible with Apple silicon for Stable Diffusion inference, using the PyTorch mps device. These are the steps you need to follow...

Deploying Transformers on the Apple Neural Engine

It will help developers minimize the impact of their ML inference workloads on app memory, app responsiveness, and device battery life. Increasing the...

Accelerate machine learning with Metal - WWDC22 - Videos

We'll take you through updates to TensorFlow training support, explore the latest features and operations of MPS Graph, and share best practices to...

Accelerated PyTorch training on Mac - Metal - Apple Developer

The Preview (Nightly) build of PyTorch will provide the latest mps support on your device. 1. Set up. Anaconda. Apple silicon. curl -O...

Fast transformer inference with Metal Performance Shaders

To support Apple Silicon GPUs, we added a new Ops implementation, MPSOps , which defaults to the mps device for Torch layers. MPSOps...