method to prepare offline cache
See original GitHub issue.from_pretrained()
can work in offline mode by loading from the cache, but we lack a method to explicitly populate that cache.
I’d like something along the lines of .cache_pretrained_for_offline()
that fetches all the files necessary for .from_pretrained()
but doesn’t actually load the weights.
Use cases would include:
- something you do in an installation step or a “prepare for offline use” action to avoid loading delays later in the application, or in anticipation of network access becoming unavailable.
- preparing an environment (archive, container, disk image, etc) on a low-resource machine that will then be copied over to the high-spec machine for production use.
It should be able to run without a GPU (or other intended target device for the model) or heaps of RAM.
The advantage of populating the huggingface_hub
cache with the model instead of saving a copy of the model to an application-specific local path is that you get to share that cache with other applications, you don’t need any extra code to apply updates to your copy, you don’t any switch to change from the default on-demand loading location to your local copy, etc.
Issue Analytics
- State:
- Created 10 months ago
- Comments:7 (7 by maintainers)
Top GitHub Comments
Mostly this, yes. It’s a lot of RAM.
And also when you try to load fp16 weights without a CUDA device, it spams a lot of verbose warnings (probably one for each sub-model with weights) that I haven’t found a clean way to suppress.
That’d be great for other reasons we’ve discussed, but is only tangentially related here. We have a requirement to make a self-contained installation for offline mode regardless of license.
I see! This PR could help a bit: https://github.com/huggingface/diffusers/pull/1450
However it still forces one to load the model into RAM but doing something like:
would be a simple fix for now. In the future we could factor out the whole downloading function. But since this function is still very prone to change and I don’t see the use case of 0-RAM downloading the models as very important at the moment, I’d prefer to have one long, readable
from_pretrained
function for now.