question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Out of memory errors no matter what parameters with deep speed

See original GitHub issue

Using these fairly lightweight parameters:

BATCH_SIZE = 8
LEARNING_RATE = 3e-4

MODEL_DIM = 512
TEXT_SEQ_LEN = 128
DEPTH = 4
HEADS = 4
DIM_HEAD = 64
REVERSIBLE = True
LOSS_IMG_WEIGHT = 7

A single V100 GPU only needs 6356MB of RAM.

[0] Tesla V100-SXM2-16GB | 57'C, 81 % | 6356 / 16160 MB |

When run with deepspeed - memory usage immediately balloons to filling up each GPU’s 16 GiB of RAM until finally running out of memory before a single iteration completes.

Aside - please dont take these personal ha - we have pinned versions and what not - just trying to be thorough so I can come back and try to fix them myself.

Traceback (most recent call last): File “train_dalle.py”, line 271, in <module> loss = distr_dalle(text, images, mask = mask, return_loss = True) File “/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 722, in _call_impl result = self.forward(*input, **kwargs) File “/opt/conda/lib/python3.7/site-packages/deepspeed/runtime/engine.py”, line 914, in forward loss = self.module(*inputs, **kwargs) File “/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 722, in _call_impl result = self.forward(*input, **kwargs) File “/root/DALLE-pytorch/dalle_pytorch/dalle_pytorch.py”, line 495, in forward loss_img = F.cross_entropy(logits[:, :, self.text_seq_len:], labels[:, self.text_seq_len:], ignore_index=0) File “/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py”, line 2422, in cross_entropy return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction) File “/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py”, line 1591, in log_softmax ret = input.log_softmax(dim) RuntimeError: CUDA out of memory. Tried to allocate 394.00 MiB (GPU 0; 15.78 GiB total capacity; 1.80 GiB already allocated; 178.75

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:11 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
janEbertcommented, Apr 1, 2021

It absolutely will, I’m sure! 😃 I actually thought about supporting both, we could rename deepspeed_utils to distributed_utils or something and wrap both, Horovod and DeepSpeed (and whatever else). We have a lot more experience with Horovod, too, but DeepSpeed just offers more possibilities. Please hit me up if you encounter difficulties.

1reaction
janEbertcommented, Apr 1, 2021

Hey @afiaka87, sadly I wasn’t able to get to this yet. I’ll probably not be able to get to it until the middle of next week. My only suggestion would be to turn some knobs in the deepspeed_config dictionary. Sadly, their two documentation websites (https://www.deepspeed.ai/ and https://deepspeed.readthedocs.io/) are really sparse with a lot of magic going on in the background they don’t explain, even in their guides.

It’s also the first time for me using DeepSpeed so I have some figuring out and diving into the source code left to do to fix these kinds of issues.

Currently, I see that DeepSpeed is not able to handle some things inside the models which is why half-precision training won’t work out of the box (according to their documentation, it should). Again, I need more time to get into it which I sadly only have in a week or so. As I’m still pretty clueless about DeepSpeed, maybe someone else with experience will be even faster. 😉

On a sidenote, HuggingFace Transformers also have DeepSpeed support (see this PR). Maybe they have had similar issues.

Some other nice links:

Read more comments on GitHub >

github_iconTop Results From Across the Web

Resolve "Out of Memory" Errors - MATLAB & Simulink
Troubleshoot errors when MATLAB cannot allocate the requested memory. ... No matter how you run into memory limits, MATLAB provides several solutions ...
Read more >
DeepSpeed Integration — transformers 4.10.1 documentation
The memory is shared by stage3_max_live_parameters and stage3_max_reuse_distance , so its not additive, its just 2GB total. stage3_max_live_parameters is the ...
Read more >
Why am I getting GPU ran out of memory error here?
In your case, the GPU simply runs out of memory, because your VRAM is too small. 2GB is very few video memory for...
Read more >
ZeRO — DeepSpeed 0.8.0 documentation - Read the Docs
Do not partition parameters smaller than this threshold. Smaller values use less memory, but can greatly increase communication (especially latency-bound ...
Read more >
Memory optimization for rendering in Blender
There are a lot of parameters that dictate memory usage. In this article we will take a deep dive in RAM usage for...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found