Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Optimize model loading by natively using Accelerate, please.

See original GitHub issue

Is your feature request related to a problem? Please describe. Loading pretrained models, for example UNet2DConditionModel peaks at 8GB when loading a pretrained one, while it takes a lot less memory when it is fully loaded to the GPU. That is happens likely because there are some redundancies between loaded weights and the model.

When we start it with empty weights and load them using accelerate, it peaks on 4.91GB, which lets us deploy Diffusers in servers that have a lot less RAM and thus are cheaper.

Describe the solution you’d like It would be great if Diffusers natively supported loading models from pretrained using accelerate.init_empty_weights and accelerate.accelerate.load_checkpoint_and_dispatch, so it has lower memory footprint.

Describe alternatives you’ve considered

Loading the models with accelerate myself, but that requires model surgery and a lot of coding, which makes me prone to mistakes
Do what accelerate does by hand

Additional context Using accelerate to natively load models would also reduce the RAM memory footprint when loading pipelines composed of more than one model, by already allocating them on the proper device and thus enabling us to deploy diffusers on cheaper servers.

Issue Analytics

State:
Created a year ago
Comments:16 (15 by maintainers)

Top GitHub Comments

2reactions

patrickvonplatencommented, Sep 2, 2022

Happy to include accelerate! Maybe we can follow the same logic we have in transformers with device_map="auto" 😃 And definitely +1 to make use of torch’s meta device to half the peak required memory usage!

2reactions

anton-lcommented, Aug 31, 2022

Hey @piEsposito, happy to look into that in the coming week! If you have a code snippet that works for you, feel free to include it here as a reference 🤗

@muellerzr @patrickvonplaten wdyt? We can either do this, or bring the low-RAM loading code from transformers