Reduce Stable Diffusion memory usage by keeping unet only on GPU.
See original GitHub issueIs your feature request related to a problem? Please describe. Stable Diffusion is not compute heavy on all its steps. If we keep the diffusion unet on fp16 on GPU and everything else on CPU, we could reduce the GPU usage to 2.2GB while having a non-so-big impact on performance. It should democratize Stable Diffusion even further.
Only other thing that would need to be done is move the tensors from the devices accordingly, but we can use the models device
and dtype
attributes to make everything work.
Describe the solution you’d like I think what I’m proposing on https://github.com/huggingface/diffusers/pull/537 should be enough.
Describe alternatives you’ve considered Alternative is to use GPUs for the whole process and pay more for it.
Issue Analytics
- State:
- Created a year ago
- Comments:7 (7 by maintainers)
Top Results From Across the Web
Memory and speed - Hugging Face
We present some techniques and ideas to optimize Diffusers inference for memory or speed. As a general rule, we recommend the use of...
Read more >Running Stable Diffusion on Your GPU with Less Than 10Gb ...
I was looking at GPU graphs and neglected my physical RAM. This machine only has 32G and I didn't notice I was hitting...
Read more >How to Fine-tune Stable Diffusion using Dreambooth
This tutorial focuses on how to fine-tune Stable Diffusion using another method ... optimized the code to reduce VRAM usage to under 16GB....
Read more >Command Line stable diffusion runs out of GPU memory but ...
My use case is i want it to execute to completion even if it takes much longer on my CPU as my machine...
Read more >Stable Diffusion Tutorial Part 1: Run Dreambooth in Gradient ...
Note: You will need at least 16 GB of GPU RAM to run this model training. The P5000, P6000, V100, V100-32G, RTX5000, A4000,...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hey @piEsposito,
I’m wondering whether we could maybe try to just write a community pipeline for this: https://github.com/huggingface/diffusers/tree/main/examples/community
I’ve created a feature request on
accelerate
to enable solving this in a more elegant way. If they let me work on the feature, I can open a PR and then try solving this.