Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

No fp16 support from fast-transformers (CausalDotProduct)

See original GitHub issue

There is the following RuntimeError: expected scalar type Float but found Half when using fp16 (for causal attention).

That is because the fast-transformers’ CausalDotProduct doesn’t support fp16. Do you think there is any workaround, because using Float is bad news for memory usage and also disables DeepSpeed ZeRO optimizations?

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:75 (74 by maintainers)

Top GitHub Comments

2reactions

gulnazakicommented, Dec 14, 2020

Ooof, I changed Adam eps from 1e-8 to 1e-4 and everything is fine now (at 500 steps). I will continue training and let you know if anything bad happens. I should have thought of it earlier.

2reactions

lucidrainscommented, Dec 13, 2020

hopefully the codebase is both pytorch amp and nvidia apex compatible now

Top Results From Across the Web

How To Fit a Bigger Model and Train It Faster - Hugging Face

This section gives brief ideas on how to make training faster and support bigger models. Later sections will expand, demonstrate and elucidate each...

pytorch-fast-transformers - PyPI

Provide a library with fast transformer implementations. Navigation. Project description; Release history; Download files. Project links.

Enabling Efficient Inference of Transformer Models at ... - arXiv

Although PP does not help with the aggregate memory bandwidth since each micro- batch traverses the full depth of the model in sequence...

s9832-taking-advantage-of-mixed-precision-to ... - NVIDIA

No architecture changes required ... Sum of FP16 values whose ratio is >211 is just the larger value ... apex.amp supports different optimization...

Performer - Pytorch

No fp16 support from fast-transformers (CausalDotProduct) ... expected scalar type Float but found Half when using fp16 (for causal attention).