question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add inference support for `mps` device (Apple Silicon)

See original GitHub issue

There is a lot of community interest in running diffusers on Apple Silicon. A first step could be to introduce support for the mps device, which currently requires PyTorch nightly builds. Another step down the line could be to convert to Core ML and optimize to make sure that the use of the ANE (Neural Engine) is maximized.

This PR deals with the first approach.

Describe the solution you’d like The following should work:

pipe = StableDiffusionPipeline.from_pretrained(model_id, use_auth_token=True)
pipe = pipe.to("mps")

# Rest of inference code remains the same

pipe.to would determine whether mps is available in the computer, and would raise an error if it’s not.

One consideration is that perhaps not all the components have to be moved to the mps device. Perhaps it’s more efficient or practical to keep the text encoder in CPU, for instance. If so, to() would move the required components and the pipelines would need to be adapted to move the tensors transparently.

Describe alternatives you’ve considered Conversion to Core ML, as mentioned above.

Additional context I have tested the unet module in mps vs cpu, and these are some preliminary results in my computer (M1 Max with 64 GB of unified RAM), when iterating through the 51 default steps of Stable Diffusion with the default scheduler and no classifier-free guidance:

Device: cpu, torch: 1.12.1,  time: 92.09s
Device: cpu, torch: 1.13.0.dev20220830,  time: 92.12s
Device: mps:0, torch: 1.13.0.dev20220830,  time: 17.20s

Execution in mps complains about one operation having to be performed in CPU:

The operator 'aten::masked_select' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)

We need to investigate the reason and whether an alternative would yield better performance.

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:3
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
mlajtoscommented, Sep 3, 2022

Regarding the alternative, i.e. converting the model to CoreML:

@madebyollin made gist that can convert the U-net to a CoreML model, which can be injected into the pipeline, or used as a standalone model in a native app. However, it is not a complete model as this library (diffusers) is doing an extra work around the model.

1reaction
mjacommented, Sep 2, 2022

Here’s everything I’ve learned from the above linked resources as a Jupyter notebook that can be run in VSCode on MacOS: StableDiffusion-MPS.ipynb.

When running the pipeline I’ve confirmed using Activity Monitor that the Python process is using the GPU.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to use Stable Diffusion in Apple Silicon (M1/M2)
Diffusers is compatible with Apple silicon for Stable Diffusion inference, using the PyTorch mps device. These are the steps you need to follow...
Read more >
Deploying Transformers on the Apple Neural Engine
It will help developers minimize the impact of their ML inference workloads on app memory, app responsiveness, and device battery life. Increasing the...
Read more >
Accelerate machine learning with Metal - WWDC22 - Videos
We'll take you through updates to TensorFlow training support, explore the latest features and operations of MPS Graph, and share best practices to...
Read more >
Accelerated PyTorch training on Mac - Metal - Apple Developer
The Preview (Nightly) build of PyTorch will provide the latest mps support on your device. 1. Set up. Anaconda. Apple silicon. curl -O...
Read more >
Fast transformer inference with Metal Performance Shaders
To support Apple Silicon GPUs, we added a new Ops implementation, MPSOps , which defaults to the mps device for Torch layers. MPSOps...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found