Question: Can `DeepSpeedCPUAdam` be used as a drop in replacement to `torch.optim.Adam`?
See original GitHub issueHi,
I want to use the DeepSpeedCPUAdam
instead of torch.optim.Adam
for reducing the RAM usage of my GPUs while training.
I was wondering if DeepSpeedCPUAdam
can be just dropped in instead of torch.optim.Adam
or additional steps are needed?
I tried to do exactly that and I got a segmentation fault
Thanks
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (6 by maintainers)
Top Results From Across the Web
torch.optim — PyTorch 1.13 documentation
To use torch.optim you have to construct an optimizer object, that will hold the ... Implements lazy version of Adam algorithm suitable for...
Read more >Optimizers — DeepSpeed 0.8.0 documentation - Read the Docs
Optimizers¶. DeepSpeed offers high-performance implementations of Adam optimizer on CPU; FusedAdam , FusedLamb , OnebitAdam , OnebitLamb optimizers on GPU.
Read more >How to use Adam optim considering of its adaptive learning ...
Using pytorch. I have tried to initialize the optim after each epoch if I use batch train, and I do nothing when the...
Read more >How to optimize a function using Adam in pytorch - ProjectPro
For optimizing a function we are going to use torch.optim which is a package, implements numerous optimization algorithms.
Read more >Optimization — transformers 3.1.0 documentation
Implements Adam algorithm with weight decay fix as introduced in ... AdaFactor pytorch implementation can be used as a drop in replacement for...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@tjruwase thanks, I opened a new issue
@peterukk, DeepSpeedCPUAdam will not work without CUDA (in theory it could). The reason is that DeepSpeedCPUAdam has a mode of execution where it also copies the updated parameters back to GPU using CUDA kernels. Do you have scenario where you want to use DeepSpeedCPUAdam outside CUDA environment? Depending on your answer can you please open a Question or reopen this one as appropriate? Thanks.