question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

--fp16 Slower & Does Not Reduce Memory Use

See original GitHub issue

Hey there @lucidrains,

Came across your incredible work and immediately tried it out on my RTX 2070! Since the training will take some time and require a lot of memory, I was relieved that we can use APEX/Amp to train the model by simply adding the --fp16 option.

Unfortunately for me, the memory usage does not reduce compared to the regular fp32 training and the training speed was slower too.

Came across a similar issue #129 but it was closed before a fix was checked in. Will you still continue to work on fp16? I believe this will help many of your users (and fans!)

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:3
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
tannisrootcommented, Oct 1, 2020

A bit unrelated, but I can’t even get it running - I just keep getting NaN errors and the learning shutdowns.

1reaction
lucidrainscommented, Oct 1, 2020

@tannisroot yea, I get that feedback a lot. I think I will just remove this feature from the readme and keep it as a silent feature. Perhaps someone can help figure out what’s wrong. It has worked for me in the past, so I’m not sure what changed

Read more comments on GitHub >

github_iconTop Results From Across the Web

Fp16 training with feedforward network slower time and no ...
The fp16 should still decrease ur memory foot print even if it's by a small factor. It's possible that the decrease is so...
Read more >
Memory and speed - Hugging Face
We present some techniques and ideas to optimize Diffusers inference for memory or speed. As a general rule, we recommend the use of...
Read more >
Use FP16 regardless if it is slower or not - TensorRT
Use FP16 regardless if it is slower or not ... [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0...
Read more >
Optimize TensorFlow GPU performance with the TensorFlow ...
Keep in mind that offloading computations to GPU may not always be ... is more aggressive and may reduce parallelism and use more...
Read more >
Speed Up Model Training - PyTorch Lightning - Read the Docs
Increasing num_workers will ALSO increase your CPU memory consumption. The best thing to do is to increase the num_workers slowly and stop once...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found