torch.cuda.amp > apex.amp
See original GitHub issueFor a while now my main focus has been moving mixed precision functionality into Pytorch core. It was merged about a month ago: https://pytorch.org/docs/master/amp.html https://pytorch.org/docs/master/notes/amp_examples.html and is now usable via master or nightly pip/conda packages. (Full features did not make the 1.5 release, unfortunately.)
torch.cuda.amp
is more flexible and intuitive, and the native integration brings more future optimizations into scope. Also, torch.cuda.amp
fixes many of apex.amp
’s known pain points. Some things native amp can handle that apex amp can’t:
- Guaranteed Pytorch version compatibility, because it’s part of Pytorch
- No need to build extensions
- Windows support
- Bitwise accurate saving/restoring
- DataParallel and intra-process model parallelism (although we still recommend torch.nn.DistributedDataParallel with one GPU per process as the most performant approach)
- Gradient penalty (double backward)
torch.cuda.amp.autocast()
has no effect outside regions where it’s enabled, so it should serve cases that formerly struggled with multiple calls toapex.amp.initialize()
(including cross-validation) without difficulty. Multiple convergence runs in the same script should each use a fresh GradScaler instance, but GradScalers are lightweight and self-contained so that’s not a problem.- Sparse gradient support
If all you want is to try mixed precision, and you’re comfortable using a recent Pytorch, you don’t need Apex.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:78
- Comments:21 (6 by maintainers)
Top Results From Across the Web
Torch.cuda.amp vs Nvidia apex? - PyTorch Forums
I'm happy with torch.cuda.amp . It's more flexible and intuitive than Apex Amp, and repairs many of Apex Amp's known flaws. Apex Amp...
Read more >apex.amp — Apex 0.1.0 documentation - GitHub Pages
This page documents the updated API for Amp (Automatic Mixed Precision), a tool to enable Tensor ... Linear(D_in, D_out).cuda() optimizer = torch.optim.
Read more >Auto Mixed Precision Training - Colossal-AI
In Colossal-AI, we have incorporated different implementations of mixed precision training: torch.cuda.amp; apex.amp; naive amp ...
Read more >apex PyTorch Model
apex.amp is a tool to enable mixed precision training by changing only 3 ... DistributedDataParallel is a module wrapper, similar to torch.nn.parallel.
Read more >Python Examples of apex.amp - ProgramCreek.com
This page shows Python examples of apex.amp. ... len(FR.vocab) cuda = torch.cuda.is_available() if args.fp16: from apex import amp else: amp = None net,...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@mcarilli what about the opt_level O1 / O2 , etc… I can’t find whether that’s already natively supported by
torch.cuda.amp
- it looks like there’s no opt_level option intorch.cuda.amp
? If so, what’s the opt_level being used by default when usingautocast
?@mcarilli hi, thanks for you great work! In my task, comparing to opt-level O1, opt-level O2 can train faster yet has no damage on performance. So are there any workaround to support amp behavior like O2. Can I just cast the model weights to FP16 (except batch-norm and etc.) before training ? like