Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

--fp16 Slower & Does Not Reduce Memory Use

See original GitHub issue

Hey there @lucidrains,

Came across your incredible work and immediately tried it out on my RTX 2070! Since the training will take some time and require a lot of memory, I was relieved that we can use APEX/Amp to train the model by simply adding the --fp16 option.

Unfortunately for me, the memory usage does not reduce compared to the regular fp32 training and the training speed was slower too.

Came across a similar issue #129 but it was closed before a fix was checked in. Will you still continue to work on fp16? I believe this will help many of your users (and fans!)

Issue Analytics

State:
Created 3 years ago
Reactions:3
Comments:9 (3 by maintainers)

Top GitHub Comments

2reactions

tannisrootcommented, Oct 1, 2020

A bit unrelated, but I can’t even get it running - I just keep getting NaN errors and the learning shutdowns.

1reaction

lucidrainscommented, Oct 1, 2020

@tannisroot yea, I get that feedback a lot. I think I will just remove this feature from the readme and keep it as a silent feature. Perhaps someone can help figure out what’s wrong. It has worked for me in the past, so I’m not sure what changed