Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Question: Can `DeepSpeedCPUAdam` be used as a drop in replacement to `torch.optim.Adam`?

See original GitHub issue

Hi,

I want to use the DeepSpeedCPUAdam instead of torch.optim.Adam for reducing the RAM usage of my GPUs while training. I was wondering if DeepSpeedCPUAdam can be just dropped in instead of torch.optim.Adam or additional steps are needed? I tried to do exactly that and I got a segmentation fault

Thanks

Issue Analytics

State:
Created 3 years ago
Comments:10 (6 by maintainers)

Top GitHub Comments

1reaction

peterukkcommented, Aug 5, 2021

@tjruwase thanks, I opened a new issue

0reactions

tjruwasecommented, Aug 5, 2021

@peterukk, DeepSpeedCPUAdam will not work without CUDA (in theory it could). The reason is that DeepSpeedCPUAdam has a mode of execution where it also copies the updated parameters back to GPU using CUDA kernels. Do you have scenario where you want to use DeepSpeedCPUAdam outside CUDA environment? Depending on your answer can you please open a Question or reopen this one as appropriate? Thanks.

Top Results From Across the Web

torch.optim — PyTorch 1.13 documentation

To use torch.optim you have to construct an optimizer object, that will hold the ... Implements lazy version of Adam algorithm suitable for...

Optimizers — DeepSpeed 0.8.0 documentation - Read the Docs

Optimizers¶. DeepSpeed offers high-performance implementations of Adam optimizer on CPU; FusedAdam , FusedLamb , OnebitAdam , OnebitLamb optimizers on GPU.

How to use Adam optim considering of its adaptive learning ...

Using pytorch. I have tried to initialize the optim after each epoch if I use batch train, and I do nothing when the...

How to optimize a function using Adam in pytorch - ProjectPro

For optimizing a function we are going to use torch.optim which is a package, implements numerous optimization algorithms.

Optimization — transformers 3.1.0 documentation

Implements Adam algorithm with weight decay fix as introduced in ... AdaFactor pytorch implementation can be used as a drop in replacement for...