Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can activation_checkpointing offloads to NVMe?

See original GitHub issue

There is a cpu_checkpointing in config, why can’t offload it to NVMe?

Issue Analytics

State:
Created 2 years ago
Comments:13 (13 by maintainers)

Top GitHub Comments

1reaction

ghosthamletcommented, Aug 21, 2021

@tjruwase Thanks. I’m looking forward to that. DeepSpeed is an incredible library, never thought i can train 2.6B params model on an 2080Ti GPU, and with fast enough speed. I leaned much from this library, its articles and papers. Thanks again.

Do you have some roadmap for DeepSpeed project? Like Some new features plan, or some big refactor, like what Transformers(https://github.com/huggingface/transformers/) did (they modularized their single huge classes/files).

0reactions

tjruwasecommented, Aug 19, 2021

@ghosthamlet, I am glad that things are working now. However, I am going to keep this issue open to address the original request.

Top Results From Across the Web

[BUG] NVMe Offload, error while fetching submodule ... - GitHub

Describe the bug I want to test ZeRO-Infinity NVMe offload for large ... /deepspeed/runtime/activation_checkpointing/checkpointing.py", ...

DeepSpeed Integration - Hugging Face

You can choose to offload both optimizer states and params to NVMe, or just one of them or none. For example, if you...

Train 1 trillion+ parameter models - PyTorch Lightning

Do not wrap the entire model with activation checkpointing. ... Additionally, DeepSpeed supports offloading to NVMe drives for even larger models, ...

ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale ...

A critical question of offloading to CPU and NVMe memory is whether their limited ... Model states and activation checkpoints can have varying...

Activation Checkpointing - Amazon SageMaker

Activation checkpointing (or gradient checkpointing) is a technique to reduce memory usage by clearing activations of certain layers and recomputing them ...