Will GPT2 775M model finetune on 16G VRAM and 24G RAM? (answer is 'yes', now what about 1.3B?)
See original GitHub issueI’m using deepspeed 0.4.5+c543a41 (tried 0.4.4 as well) and transformers-4.9.1. Attempted earlier versions of everything as well.
I was able to finetune GPT2 355M with sequence length 2048, batch size 1 on this configuration (it is Google Colab), all fit in VRAM, without DeepSpeed. (strange thing that I tried also FP16 here (OM 1), to see its effect, and it won’t do much - there were 1024MB free VRAM left with FP16, and some 625MB free without).
But, when I try it with 775M sequence length 2048, batch size 1, I use FP16 and CPU offloading, I’m getting out of CPU memory on the first step of training. I tried stage 2 zero_optimization
with "allgather_bucket_size": 2e8
and "reduce_bucket_size": 2e8
but that leaves me with 9GB free VRAM when I get process killed by Linux for eating all the RAM (I confirm that RAM is gone in htop). So, that’s a waste of VRAM.
I tried to set both _size
to 2e9, that way I get only ~300MB of free VRAM left, but it didn’t help with RAM.
Here are the settings:
"zero_optimization": {
"stage": 2,
"cpu_offload": true,
"allgather_partitions": true,
"allgather_bucket_size": 2e9,
"reduce_scatter": true,
"reduce_bucket_size": 2e9,
"overlap_comm": true,
"contiguous_gradients": true
},
I also tried to use --sparse-mode alternating
with the config:
"sparse_attention": {
"mode": "fixed",
"block": 16,
"different_layout_per_head": true,
"num_local_blocks": 8,
"num_global_blocks": 1,
"attention": "unidirectional",
"horizontal_global_attention": false,
"num_different_global_patterns": 8
}
But that only leads to loss optimization autoscale goes down to 32768 instead of 1048576, and same RAM depletion.
I feel like if 335M model fine tuning does fit in 15GB VRAM, there should be a way to fit 775M in 16GB VRAM + 24GB RAM. But what should I try?
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (3 by maintainers)
Top GitHub Comments
@tjruwase thank you for your reply. And sorry for the delay, in fact I found out about your reply only yesterday - my mail suddenly tagged all github as spam. I was thinking out my update on the issue here, I hope to post it in a couple of days.
@tjruwase Hi! Thank you for your efforts. I have pretty much given up with that for now. I believe, my last reports still hold true, that’s all I know. Think we should close it?