question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Tracker] [bnb] Supporting `device_map` containing GPU and CPU devices

See original GitHub issue

Feature request

We should be able to provide custom device_map when using 8-bit models using bitsandbytes. This would enable users having more control over the modules they want to quantize.

Linked issue: https://github.com/TimDettmers/bitsandbytes/issues/40

Motivation

Users should be able to pass their own custom device_map and chose which module should be quantized or not

Your contribution

Try coding this enhancement!

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:15 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
z80maniaccommented, Nov 17, 2022

I’ve just tested that PR and it works. Thank you!

I tested it with a 13B model on GTX 3060. Without load_in_8bit only 10 layers are able to fit into the GPU. With that patch and load_in_8bit=True now 19 layers are able to fit into the GPU. Which gives a 30% speedup of the inference in my case.

For some reason when I test it on my initial example, it gives this warning:

/home/user/test/bnb-test/transformers/src/transformers/generation/utils.py:1470: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`.
  warnings.warn(

However, I was not able to reproduce it in my other more complex program.

In the PR’s discussion it was said:

this will result in weights offloaded on the CPU to not be converted in int8 at all

I expected this much, but I think it’s still better than nothing.

Though, are there some gotchas in the fact that CPU layers are not converted to 8bit?

Also, not sure how to proceed next. You said:

we should probably wait until bitsandbytes supports weights offloading in 8-bit to add this feature

So I suppose this issue should remain open? I will then add more info to my initial issue at the bitsandbytes repo.

1reaction
z80maniaccommented, Sep 18, 2022

UPDATE (for future readers): the title was changed.


I think that the title of this issue is a little bit misleading. Technically, a custom device_map is already supported for bitsandbytes, as long as all the layers are on GPU.

For example, in the linked issue, this device_map works correctly:

    device_map = {
        "transformer.wte": 0,
        "transformer.wpe": 0,
        "transformer.ln_f": 0,
        "lm_head": 0,
        "transformer.h.0": 0,
        "transformer.h.1": 0,
        "transformer.h.2": 0,
        "transformer.h.3": 0,
        "transformer.h.4": 0,
        "transformer.h.5": 0,
        "transformer.h.6": 0,
        "transformer.h.7": 0,
        "transformer.h.8": 0,
        "transformer.h.9": 0,
        "transformer.h.10": 0,
        "transformer.h.11": 0
    }

And I believe that there will be no problem in using 1 instead of 0 for any transformer.* layer if you have more than one GPU (but I may be mistaken, I didn’t find any specific info in any docs about using bitsandbytes with multiple GPUs). And I suppose that replacing all 0 with 1 will also work. So, I think that users already can customize the device map, as long as it doesn’t put anything on CPU.

The original issue was not about a custom map. It was about supporting the load_in_8bit flag for models that are shared between CPU and GPU.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Use a GPU | TensorFlow Core
TensorFlow supports running computations on a variety of types of devices, including CPU and GPU. They are represented with string identifiers for example:....
Read more >
NVIDIA GPUDirect - NVIDIA Developer
NVIDIA GPUDirect Enhancing Data Movement and Access for GPUs Whether you are exploring mountains of data, researching scientific problems, training neural ...
Read more >
Use an external graphics processor with your Mac
View the activity levels of built-in and external GPUs (Open Activity Monitor, then choose Window > GPU History.) eGPU support in apps and...
Read more >
DevCheck Device & System Info - Apps on Google Play
Monitor your hardware in real time and get complete information about your device model, CPU, GPU, memory, battery, camera, storage, network, sensors and ......
Read more >
[Motherboard] What to do if there is no power after booting up ...
Please try to remove the boot cables of the front panel, ... Note: Some motherboards may require a BIOS update to support certain...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found