ValueError("8-bit operations on `bitsandbytes` are not supported under CPU!")
See original GitHub issueHi Tim,
Thanks for your awesome work!
I’m using your method to load the largest BLOOM model (the BLOOM model with 176b parameters) onto 1 node with 8 GPUs.
model = AutoModelForCausalLM.from_pretrained(
"bloom",
device_map="auto",
load_in_8bit=True,
)
This line works for all the other smaller bloom models, eg. bloom-7b1. However when loading bloom
(176b) I got error "8-bit operations on bitsandbytes are not supported under CPU!"
.
File "/opt/conda/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 463, in from_pretrained
return model_class.from_pretrained(
File "/opt/conda/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2182, in from_pretrained
raise ValueError("8-bit operations on `bitsandbytes` are not supported under CPU!")
ValueError: 8-bit operations on `bitsandbytes` are not supported under CPU!
In my understanding, this is because some modules of the model are automatically loaded onto CPU, which didn’t happen to the smaller models. Is there a way to force the model to be loaded to GPU only? or do you have any advice on how to bypass this error? Thanks!!
Tianwei
Issue Analytics
- State:
- Created a year ago
- Comments:9 (3 by maintainers)
Top Results From Across the Web
Issues · TimDettmers/bitsandbytes - GitHub
8-bit CUDA functions for PyTorch. Contribute to TimDettmers/bitsandbytes development by creating an account on GitHub.
Read more >A Gentle Introduction to 8-bit Matrix Multiplication for ...
8-bit tensor cores are not supported on the CPU. bitsandbytes can be run on 8-bit tensor core-supported hardware, which are Turing and Ampere ......
Read more >Dreambooth Stable Diffusion training in just 12.5 GB VRAM ...
Dreambooth Stable Diffusion training in just 12.5 GB VRAM, using the 8bit adam optimizer from bitsandbytes along with xformers while being 2 ...
Read more >Python 3 for Absolute Beginners - Springer Link
ensure that important details are not missed out. In Chapter 3, you start coding those designs in Python. You will learn about constructing...
Read more >Elements of Python programming - AstroEdWiki
At this step in our short course on Python for Physics and Astronomy you have Python running, and have seen how it works...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
From my testing it seems the following happens when not enough memory is available on GPU: hf accelerate automatic device selection sees device_map = auto, and puts some layers on CPU this device map with cpu layers is passed onward the bnb code in hf transformers sees the cpu layers and raises this confusing error message. My guess is that you lack enough GPU memory for bloom.
I am closing this as this issue is related to a part of the model being on the CPU which is currently managed by the accelerate library. If this is still relevant, please open an issue there.
Regarding the BLOOM model, I will try to debug the situation and post examples to run BLOOM in a setup similar to yours.