Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ValueError("8-bit operations on `bitsandbytes` are not supported under CPU!")

See original GitHub issue

Hi Tim,

Thanks for your awesome work!

I’m using your method to load the largest BLOOM model (the BLOOM model with 176b parameters) onto 1 node with 8 GPUs.

model = AutoModelForCausalLM.from_pretrained(
                "bloom", 
                device_map="auto", 
                load_in_8bit=True,
            )

This line works for all the other smaller bloom models, eg. bloom-7b1. However when loading bloom (176b) I got error "8-bit operations on bitsandbytes are not supported under CPU!".

File "/opt/conda/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 463, in from_pretrained
    return model_class.from_pretrained(
  File "/opt/conda/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2182, in from_pretrained
    raise ValueError("8-bit operations on `bitsandbytes` are not supported under CPU!")
ValueError: 8-bit operations on `bitsandbytes` are not supported under CPU!

In my understanding, this is because some modules of the model are automatically loaded onto CPU, which didn’t happen to the smaller models. Is there a way to force the model to be loaded to GPU only? or do you have any advice on how to bypass this error? Thanks!!

Tianwei

Issue Analytics

State:
Created a year ago
Comments:9 (3 by maintainers)

Top GitHub Comments

1reaction

aninrusimhacommented, Aug 16, 2022

From my testing it seems the following happens when not enough memory is available on GPU: hf accelerate automatic device selection sees device_map = auto, and puts some layers on CPU this device map with cpu layers is passed onward the bnb code in hf transformers sees the cpu layers and raises this confusing error message. My guess is that you lack enough GPU memory for bloom.

0reactions

TimDettmerscommented, Oct 27, 2022

I am closing this as this issue is related to a part of the model being on the CPU which is currently managed by the accelerate library. If this is still relevant, please open an issue there.

Regarding the BLOOM model, I will try to debug the situation and post examples to run BLOOM in a setup similar to yours.