question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Zero-Offload Doubles VRAM Usage

See original GitHub issue

Hi,

I am testing out DeepSpeed on a single GTX 1070 GPU. Everything works fine until I try to enable “cpu_offload” in the config, which then doubles GPU memory usage from 4GB to 8GB. CPU memory usage increases significantly as expected, so it seems that the data is being copied successfully. If I also enable “overlap_comm,” as recommended here, my system runs out of memory.

Here is my config file: { "train_batch_size": 96, "gradient_accumulation_steps": 1, "optimizer": { "type": "Adam", "params": { "lr": 0.00015 } }, "fp16": { "enabled": true }, "amp": { "enabled": false }, "gradient_clipping": 1.0, "zero_optimization": { "stage": 2, "cpu_offload": true, "contiguous_gradients": true, "overlap_comm": false } }

I am on Linux Mint 19 using PyTorch 1.7, CUDA 10.2, and latest version of DeepSpeed. Let me know if you need any more information. Thank you for your time.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:17 (14 by maintainers)

github_iconTop GitHub Comments

2reactions
tjruwasecommented, Dec 22, 2020

@stas00 Thanks for the nice summary of my discussion and catching my typo as well 😃. @szhengac, I hope you find this summary useful as well.

Hopefully, we can work together to enable the maximum benefits of zero-offload for your models. So my suggestion is that we start with small configuration to nail down the expected memory usage by running with batch size = 1 on 1 GPU.

2reactions
stas00commented, Dec 22, 2020

This is the summary I saved away from this discussion:

ZeRO features that decrease gpu memory usage

  • CPU offload

    • zero_optimization.cpu_offload=true, requires zero_optimization.stage=2
    • cpu_offload should reduce GPU RAM usage,

ZeRO features that increase gpu memory usage

  • zero_optimization.allgather_bucket_size and zero_optimization.reduce_bucket_size have the biggest impact on memory usage during zero_optimization.stage=2

    both default to 500000000 => 1GB buffer (5e8 x 2Bytes)

  • Overlap comm - increases gpu memory requirements

    • zero_optimization.overlap_comm=true trades off increase GPU RAM usage to lower all-reduce latency.

    • overlap_comm uses 4.5x the zero_optimization.allgather_bucket_size and zero_optimization.reduce_bucket_size resulting in 9GB footprint by default (5e8 x 2Bytes x 2 x 4.5), so for example this 2.5x change in buffer size may be a sufficient reduction:

      “allgather_bucket_size”: 200000000, “reduce_bucket_size”: 200000000,

Read more comments on GitHub >

github_iconTop Results From Across the Web

Task Manager shows VRAM USED not ALLOCATED since the ...
Game loads all textures needed for a level into VRAM and doesn't unload textures that are no longer in use as you progress...
Read more >
ZeRO-Offload - DeepSpeed
ZeRO-Offload reduces the GPU compute and memory requirements of such models by leveraging compute and memory resources on the host CPU to execute...
Read more >
Train 1 trillion+ parameter models - PyTorch Lightning
When fine-tuning a model, use advanced memory efficient strategies such as DeepSpeed ZeRO Stage 3 or DeepSpeed ZeRO Stage 3 Offload, allowing you...
Read more >
How To Fix Modern Warfare 2 High VRAM Usage ... - YouTube
How To Fix Modern Warfare 2 High VRAM Usage, FPS Drops, Performance Issues on PC Warzone 2Here's a quick and easy tutorial on...
Read more >
VRAM to the Test: How Much Memory Is Enough? - TechSpot
First we will look at a max VRAM usage scenario using the best in-game ... I've just never had the opportunity to test...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found