[Tracker] [bnb] Supporting `device_map` containing GPU and CPU devices
See original GitHub issueFeature request
We should be able to provide custom device_map
when using 8-bit models using bitsandbytes
. This would enable users having more control over the modules they want to quantize.
Linked issue: https://github.com/TimDettmers/bitsandbytes/issues/40
Motivation
Users should be able to pass their own custom device_map
and chose which module should be quantized or not
Your contribution
Try coding this enhancement!
Issue Analytics
- State:
- Created a year ago
- Comments:15 (4 by maintainers)
Top Results From Across the Web
Use a GPU | TensorFlow Core
TensorFlow supports running computations on a variety of types of devices, including CPU and GPU. They are represented with string identifiers for example:....
Read more >NVIDIA GPUDirect - NVIDIA Developer
NVIDIA GPUDirect Enhancing Data Movement and Access for GPUs Whether you are exploring mountains of data, researching scientific problems, training neural ...
Read more >Use an external graphics processor with your Mac
View the activity levels of built-in and external GPUs (Open Activity Monitor, then choose Window > GPU History.) eGPU support in apps and...
Read more >DevCheck Device & System Info - Apps on Google Play
Monitor your hardware in real time and get complete information about your device model, CPU, GPU, memory, battery, camera, storage, network, sensors and ......
Read more >[Motherboard] What to do if there is no power after booting up ...
Please try to remove the boot cables of the front panel, ... Note: Some motherboards may require a BIOS update to support certain...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’ve just tested that PR and it works. Thank you!
I tested it with a 13B model on GTX 3060. Without
load_in_8bit
only 10 layers are able to fit into the GPU. With that patch andload_in_8bit=True
now 19 layers are able to fit into the GPU. Which gives a 30% speedup of the inference in my case.For some reason when I test it on my initial example, it gives this warning:
However, I was not able to reproduce it in my other more complex program.
In the PR’s discussion it was said:
I expected this much, but I think it’s still better than nothing.
Though, are there some gotchas in the fact that CPU layers are not converted to 8bit?
Also, not sure how to proceed next. You said:
So I suppose this issue should remain open? I will then add more info to my initial issue at the
bitsandbytes
repo.UPDATE (for future readers): the title was changed.
I think that the title of this issue is a little bit misleading. Technically, a custom
device_map
is already supported forbitsandbytes
, as long as all the layers are on GPU.For example, in the linked issue, this
device_map
works correctly:And I believe that there will be no problem in using
1
instead of0
for anytransformer.*
layer if you have more than one GPU (but I may be mistaken, I didn’t find any specific info in any docs about usingbitsandbytes
with multiple GPUs). And I suppose that replacing all0
with1
will also work. So, I think that users already can customize the device map, as long as it doesn’t put anything on CPU.The original issue was not about a custom map. It was about supporting the
load_in_8bit
flag for models that are shared between CPU and GPU.