question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't run 11 billion model on A100 with 80GB

See original GitHub issue

Hi @craffel @muqeeth @HaokunLiu,

We’re trying to reproduce T-Few results for a paper, but we’re getting ‘CUDA out of memory’ using an A100 with 80GB (your recommended setup).

This is what we’re running:

python -m src.pl_train -c t011b.json+ia3.json+rte.json -k load_weight="pretrained_checkpoints/t011b_ia3_finish.pt" exp_name=t011b_rte_seed42_ia3_pretrained few_shot_random_seed=42 seed=42

We installed according to the README instructions and are using the default settings in the config files. We are able to run the 3 billion model using the command above, just not the 11 billion. Is there anything we are doing wrong?

This is the exception:

CUDA out of memory

Thank you

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
dptamcommented, Jul 15, 2022

Sorry I think the config might be slightly off as it was meant for the 3B and not 11B versions. For the 11B variants, to fit into memory, we used a smaller batch size but still had an effect batch size of 8. Our hyperparameters werebatch_size=1 grad_accum_factor=8 eval_batch_size=2. Let us know if it still runs out of memory.

1reaction
HaokunLiucommented, Jul 12, 2022

Thanks for your interest in our work!

It’s hard to tell from the surface. Could you share with me the full log? And if you are familiar with pytorch lightning, mind if add something like print("Memory usage at line [add something here]", torch.cuda.memory_allocated(device=None)) in the start and end of training_step of EncoderDecoder.py?

Read more comments on GitHub >

github_iconTop Results From Across the Web

NVIDIA A100 Tensor Core GPU
The A100 80GB debuts the world's fastest memory bandwidth at over 2 terabytes per second (TB/s) to run the largest models and datasets....
Read more >
[DeepSpeed] [success] trained t5-11b on 1x 40GB gpu #9996
I cannot train a 13B multilingual mT5-xxl model on the 8x40GB A100 on aws p4d24xlarge . I am using This config with "fp16":...
Read more >
Nvidia launches A100 80GB GPU for supercomputers
Nvidia launched its 80GB version of the A100 graphics processing unit (GPU), targeting the graphics and AI chip at supercomputers.
Read more >
Understand BLOOM, the Largest Open-Access AI, and Run It ...
BLOOM is an open-access multilingual language model that contains 176 billion parameters and was trained for 3.5 months on 384 A100–80GB GPUs.
Read more >
NVIDIA's A100 GPU - YouTube
Your browser can't play this video. ... 11K views 2 years ago ... It's packing 54 billion transistors, which is the most ever...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found