question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Will GPT2 775M model finetune on 16G VRAM and 24G RAM? (answer is 'yes', now what about 1.3B?)

See original GitHub issue

I’m using deepspeed 0.4.5+c543a41 (tried 0.4.4 as well) and transformers-4.9.1. Attempted earlier versions of everything as well.

I was able to finetune GPT2 355M with sequence length 2048, batch size 1 on this configuration (it is Google Colab), all fit in VRAM, without DeepSpeed. (strange thing that I tried also FP16 here (OM 1), to see its effect, and it won’t do much - there were 1024MB free VRAM left with FP16, and some 625MB free without).

But, when I try it with 775M sequence length 2048, batch size 1, I use FP16 and CPU offloading, I’m getting out of CPU memory on the first step of training. I tried stage 2 zero_optimization with "allgather_bucket_size": 2e8 and "reduce_bucket_size": 2e8 but that leaves me with 9GB free VRAM when I get process killed by Linux for eating all the RAM (I confirm that RAM is gone in htop). So, that’s a waste of VRAM.

I tried to set both _size to 2e9, that way I get only ~300MB of free VRAM left, but it didn’t help with RAM.

Here are the settings:

"zero_optimization": {
     "stage": 2,
     "cpu_offload": true,
     "allgather_partitions": true,
     "allgather_bucket_size": 2e9,
     "reduce_scatter": true,
     "reduce_bucket_size": 2e9,
     "overlap_comm": true,
     "contiguous_gradients": true
  }, 

I also tried to use --sparse-mode alternating with the config:

  "sparse_attention": {
    "mode": "fixed",
    "block": 16,
    "different_layout_per_head": true,
    "num_local_blocks": 8,
    "num_global_blocks": 1,
    "attention": "unidirectional",
    "horizontal_global_attention": false,
    "num_different_global_patterns": 8
  }

But that only leads to loss optimization autoscale goes down to 32768 instead of 1048576, and same RAM depletion.

I feel like if 335M model fine tuning does fit in 15GB VRAM, there should be a way to fit 775M in 16GB VRAM + 24GB RAM. But what should I try?

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
Artyrmcommented, Aug 21, 2021

@tjruwase thank you for your reply. And sorry for the delay, in fact I found out about your reply only yesterday - my mail suddenly tagged all github as spam. I was thinking out my update on the issue here, I hope to post it in a couple of days.

0reactions
Artyrmcommented, May 16, 2022

@tjruwase Hi! Thank you for your efforts. I have pretty much given up with that for now. I believe, my last reports still hold true, that’s all I know. Think we should close it?

Read more comments on GitHub >

github_iconTop Results From Across the Web

BERT-Large can't fit in VRAM 16G,RAM 64G · Issue #1654 ...
Zero-Offload Doubles VRAM Usage #467 · Will GPT2 775M model finetune on 16G VRAM and 24G RAM? (answer is 'yes', now what about...
Read more >
GPT2 775M模型Finetune在16G VRAM和24G RAM上吗?(答案是 ...
Will GPT2 775M model finetune on 16G VRAM and 24G RAM? (answer is 'yes', now what about 1.3B?) 我正在使用DeepSpeed 0.4.5 + C543A41(也尝试0.4.4) ...
Read more >
[FEATURE]: Bump to blender 3.2 - DLR-RM/BlenderProc - IssueHint
Will GPT2 775M model finetune on 16G VRAM and 24G RAM? (answer is 'yes', now what about 1.3B?) 9, 2021-08-07, 2022-08-05.
Read more >
Could we use the implicit default api version with path route ...
Since no API version was specified or readable from the URL, the default API version will be used ... Will GPT2 775M model...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found