Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BlenderBot 3 30B. CUDA out of memory.

See original GitHub issue

Bug description Hello, I have been trying to run BlenderBot 3 30B and I keep getting the same problem. When I run metaseq-api-local, CUDA runs out of memory but when I try to reduce the number of batches, the same error keeps popping up regardless of the batch size. These are my parameters on the constants.py file:

MAX_SEQ_LEN = 1024 BATCH_SIZE = 64 # silly high bc we dynamically batch by MAX_BATCH_TOKENS MAX_BATCH_TOKENS = 1024 DEFAULT_PORT = 6010 MODEL_PARALLEL = 4 TOTAL_WORLD_SIZE = 4 MAX_BEAM = 8

I am currently using 4 T4 GPUs on GCP.

Reproduction steps

Modify the constants.py file as shown above
Run metaseq-api-local

Expected behavior Give a clear and concise description of what you expected to happen.

Logs Please paste the command line output:


RuntimeError: CUDA out of memory. Tried to allocate 296.00 MiB (GPU 0; 14.62 GiB total capacity; 13.69 GiB already allocated; 237.00 MiB free; 13.82 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Additional context I would really appreciate it if someone can indicate to me how to change such parameters so that I can run the model with the resources I have. Thank you

Issue Analytics

State:
Created a year ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

klshustercommented, Sep 1, 2022

With the way that metaseq is set up, each sharded checkpoint file will be loaded onto a single GPU. So, I’m not totally sure the situations you listed above are possible in the current setup. model parallel does not necessarily need to be same as total world size - there is a way to run the model with e.g. the model parallel + fully-sharded data parallel shards, but that is not recommended (FSDP during inference is quite slow due to node communication latency)

1reaction

klshustercommented, Aug 26, 2022

assuming you used the reshard_model_parallel script, pull in these changes to your metaseq checkout: https://github.com/facebookresearch/metaseq/pull/170. that should fix the problem

Top Results From Across the Web

Out of Memory (OOM) when repeatedly running large models

Any advice for freeing up GPU memory after training a large model (e.g., ... 129 130 def gelu_new(x): RuntimeError: CUDA out of memory....

Cycles / CUDA Error: Out of Memory - Blender Stack Exchange

The short answer is that SSS on the GPU eats up a lot of memory, so much so that it is recommended to...

Bb3 - ParlAI

BlenderBot 3 (BB3) is a 175B-parameter, publicly available chatbot released with model weights, code, datasets, and model cards. We've deployed it in a...

BlenderBot 3: An AI Chatbot That Improves Through ... - Meta

Our new AI research chatbot is designed to improve its conversational skills and safety through feedback from people who use it.

Cuda Error: Out of Memory : r/blenderhelp - Reddit

Open blender and scene. See how much ram the scene is using. Start a render and compare the difference. If the difference is...