question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using FP16_Optimizer does not faster much

See original GitHub issue

I run following scripts and compare the logs of them,
fp32 training:

python main_fp16_optimizer.py /workspace/data/imagenet

and fp16 mixed precision training:

python main_fp16_optimizer.py /workspace/data/imagenet --fp16

Here are theirs logs, fp32 training logs:

Epoch: [0][10/1563]     Time 0.211 (0.507)      Speed 151.834 (63.162)  Data 0.001 (0.075)      Loss 7.0819 (7.0585)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.000)

and fp16 mixed precision training logs:

Epoch: [0][10/1563]     Time 0.220 (0.530)      Speed 145.334 (60.358)  Data 0.001 (0.068)      Loss 7.1602 (7.0614)    Prec@1 0.000 (0.852)    Prec@5 0.000 (1.136)

It’s easy to find that the mixed precision training version didn’t faster much, so is there anything wrong?

btw, I used a single gpu. Thanks

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
mcarillicommented, Feb 12, 2019

What gpu are you using? For those particular examples, I would only expect to see significant speedups on a device with Tensor Cores (Volta or Turing). Other architectures would benefit from the reduced bandwidth requirements of FP16, but the compute won’t be faster than FP32 (and for some Pascal cards like the 1080Ti, the compute throughput is actually much slower in FP16).

0reactions
mcarillicommented, Feb 20, 2019

In general, GPUs like contiguous tensors in which the beginning each fastest-dim row is aligned to at least 32 bytes. The change you made may have helped with that requirement for some ops in the network, so the speedup you observed may have had nothing to do with cuDNN. Then again, it might also have made cuDNN’s padding job easier (cuDNN needs to transpose the data at certain points, and inserts padding while it transposes).

Alright, I’m going to be updating the documentation substantially anyway for the merge of my “Amp 1.0” release by the end of the month. I’m giving a webinar about that today if you’re interested. https://info.nvidia.com/webinar-mixed-precision-with-pytorch-reg-page.html Sorry, I should have remembered to say that earlier. I will post the presentation afterwards.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How To Fit a Bigger Model and Train It Faster - Hugging Face
While bf16 has a worse precision than fp16, it has a much much bigger dynamic range. Therefore, if in the past you were...
Read more >
Train With Mixed Precision - NVIDIA Documentation Center
Third, math operations run much faster in reduced precision, especially on GPUs with Tensor Core support for that precision.
Read more >
Mixed precision training - fastai
So training at half precision is better for your memory usage, way faster if you have a Volta GPU (still a tiny bit...
Read more >
Using mixed precision training with Gradient - Paperspace Blog
On the other hand, deep learning with FP16 takes less memory and runs more quickly, but with less precision in the data and...
Read more >
PyTorch Quick Tip: Mixed Precision Training (FP16) - YouTube
FP16 approximately doubles your VRAM and trains much faster on newer GPUs. I think everyone should use this as a default.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found