question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

enable automatic mixed precision for xla

See original GitHub issue

Feature

Automatic mixed precision for xla has landed in pytorch 1.8.1 and torch/xla nightly. We should enable it in create_supervised_* helper functions.

Suggested solution

Remove xla and amp checks in _check_arg().

  • For create_supervised_trainer, update supervised_training_step_tpu() function to accept scaler argument just like supervised_training_step_amp().
  • For create_supervised_evaluator, just removing xla and amp checks in _check_arg() should work.
  • For tests, we could remove xla checks and only run with pytorch 1.8.1.

Additional context

This feature should not be included in ignite release until the next torch and xla comes out.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:1
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
vfdev-5commented, Sep 13, 2021

@01-vyom thanks a lot for the tests and the feedback !

For TPUs can you please check on Colab updated ignite code (tpu + amp) on CIFAR10 dataset if it trains and faster then without amp ?

As for GPUs, maybe if we could clearly understand how to install xla with GPU support on an infrastructure with GPUs (using docker), we could test that from our side as well. In this case, maybe we could have more luck with matching versions. Example how to install locally torch_xla with CPU support: https://github.com/pytorch/ignite/blob/master/.github/workflows/tpu-tests.yml#L52-L61

1reaction
01-vyomcommented, Sep 12, 2021

I check it out on the Colab for TPUs : Colab, it works on TPU. Code used from the following issue: https://github.com/pytorch/pytorch/issues/61804

I tried to test on GPU but I am not able to match the versions of pytorch-xla and pytorch cuda.

Also, multiple developers on xla and pytorch suggest that AMP will run on GPU only as TPU doesn’t support float 16 [Which it does, according to my tests]. https://github.com/pytorch/pytorch/pull/48570#discussion_r536282158

Moreover, the tests that they have included for autocast only runs with XLA:GPU and XLA:CPU: https://github.com/pytorch/xla/blob/81da600883f0d6342b19749cc08be18b8decc051/test/test_train_mp_imagenet.py#L30-L33

https://github.com/pytorch/xla/pull/3089

So, I think it works for both GPU as well as TPU, but their codebase only shows/acknowledges CPU and GPU.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Use Automatic Mixed Precision on Tensor Cores in ...
Enabling mixed precision involves two steps: porting the model to use the half-precision data type where appropriate; and using loss scaling to ...
Read more >
Mixed precision | TensorFlow Core
Overview. Mixed precision is the use of both 16-bit and 32-bit floating-point types in a model during training to make it run faster...
Read more >
Automatic Mixed Precision in TensorFlow for Faster AI ...
Enabling mixed precision involves two steps: porting the model to use the half-precision data type where appropriate; and using loss scaling to preserve...
Read more >
Issue #1330 · pytorch/xla - Mixed precision training - GitHub
Like batchnorm layer for instance, this layer should be kept at 32. Nvidia/Apex does this automatically using the O2 mode . Should we...
Read more >
Automatic Mixed Precision — PyTorch Tutorials 1.12.1+cu102 ...
Mixed precision primarily benefits Tensor Core-enabled architectures (Volta, Turing, Ampere). This recipe should show significant (2-3X) speedup on those ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found