Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

torch.cuda.amp > apex.amp

See original GitHub issue

For a while now my main focus has been moving mixed precision functionality into Pytorch core. It was merged about a month ago: https://pytorch.org/docs/master/amp.html https://pytorch.org/docs/master/notes/amp_examples.html and is now usable via master or nightly pip/conda packages. (Full features did not make the 1.5 release, unfortunately.)

torch.cuda.amp is more flexible and intuitive, and the native integration brings more future optimizations into scope. Also, torch.cuda.amp fixes many of apex.amp’s known pain points. Some things native amp can handle that apex amp can’t:

Guaranteed Pytorch version compatibility, because it’s part of Pytorch
No need to build extensions
Windows support
Bitwise accurate saving/restoring
DataParallel and intra-process model parallelism (although we still recommend torch.nn.DistributedDataParallel with one GPU per process as the most performant approach)
Gradient penalty (double backward)
torch.cuda.amp.autocast() has no effect outside regions where it’s enabled, so it should serve cases that formerly struggled with multiple calls to apex.amp.initialize() (including cross-validation) without difficulty. Multiple convergence runs in the same script should each use a fresh GradScaler instance, but GradScalers are lightweight and self-contained so that’s not a problem.
Sparse gradient support

If all you want is to try mixed precision, and you’re comfortable using a recent Pytorch, you don’t need Apex.

Issue Analytics

State:
Created 3 years ago
Reactions:78
Comments:21 (6 by maintainers)

Top GitHub Comments

18reactions

Damioxcommented, May 12, 2020

@mcarilli what about the opt_level O1 / O2 , etc… I can’t find whether that’s already natively supported by torch.cuda.amp - it looks like there’s no opt_level option in torch.cuda.amp ? If so, what’s the opt_level being used by default when using autocast?

8reactions

Vincent717commented, Nov 20, 2020

@mcarilli hi, thanks for you great work! In my task, comparing to opt-level O1, opt-level O2 can train faster yet has no damage on performance. So are there any workaround to support amp behavior like O2. Can I just cast the model weights to FP16 (except batch-norm and etc.) before training ? like

model = convert_most_weights_to_half(model)
with autocast():
        output = model(input)
        loss = loss_fn(output, target)
loss.backward()
optimizer.step()

Top Results From Across the Web

Torch.cuda.amp vs Nvidia apex? - PyTorch Forums

I'm happy with torch.cuda.amp . It's more flexible and intuitive than Apex Amp, and repairs many of Apex Amp's known flaws. Apex Amp...

apex.amp — Apex 0.1.0 documentation - GitHub Pages

This page documents the updated API for Amp (Automatic Mixed Precision), a tool to enable Tensor ... Linear(D_in, D_out).cuda() optimizer = torch.optim.

Auto Mixed Precision Training - Colossal-AI

In Colossal-AI, we have incorporated different implementations of mixed precision training: torch.cuda.amp; apex.amp; naive amp ...

apex PyTorch Model

apex.amp is a tool to enable mixed precision training by changing only 3 ... DistributedDataParallel is a module wrapper, similar to torch.nn.parallel.

Python Examples of apex.amp - ProgramCreek.com

This page shows Python examples of apex.amp. ... len(FR.vocab) cuda = torch.cuda.is_available() if args.fp16: from apex import amp else: amp = None net,...