Deprecate the nvidia/apex integration
See original GitHub issueProposed refactor
Deprecation:
- Deprecate the
ApexMixedPrecisionPlugin
and passingTrainer(amp_backend=...)
. To be removed in 1.10 - Add deprecation notices to apex throughout our docs
Removal:
- Remove all the apex-related glue throughout the codebase
- Remove the apex installation from CI
- Remove apex from our docs
Motivation
APEX AMP can be regarded as deprecated in favor of PyTorch AMP which Michael Carilli implemented and advocated in #1337.
Most developer activity in the nvidia/apex repository happen in either apex/transformer, apex/optimizers, tests/L0, and/or apex/contrib. apex/amp directory hasn’t seen changes for about 2 years
Given the 2-year hibernation would make it almost impossible to resume the support for the different optimization levels to O2.
It’s unclear whether any nvidia teams use our apex plugin internally.
And the nvidia team is unable to provide support for apex bugs.
If you enjoy Lightning, check out our other projects! ⚡
-
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
-
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
-
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
-
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
-
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging PyTorch Lightning, Transformers, and Hydra.
cc @tchaton @rohitgr7 @carmocca @justusschock @awaelchli @akihironitta @kaushikb11 @borda
Issue Analytics
- State:
- Created a year ago
- Reactions:5
- Comments:5 (4 by maintainers)
The native amp implementation via
torch.amp
ortorch.cuda.amp
is close to the legacyapex.amp
O1
opt level while the legacyO3
level was mainly used for debugging and performance testing as it’s the “pure FP16” implementation (it calls.half()
on the data and model directly, which can be dangerous). I agree that deprecatingapex.amp
from Lightning sounds like a good idea and to focus on the native implementation.I wonder if DeepSpeed is impacted with the same checkpointing problems when apex is used.
I don’t think so. It might not provide relevant efficiency improvements for most use cases so maybe they scrapped supporting it. @ptrblck, if you have any insights here, we’d love to hear them from you 🙇
Also relevant: https://github.com/pytorch/pytorch/issues/52279