question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

No fp16 support from fast-transformers (CausalDotProduct)

See original GitHub issue

There is the following RuntimeError: expected scalar type Float but found Half when using fp16 (for causal attention).

That is because the fast-transformers’ CausalDotProduct doesn’t support fp16. Do you think there is any workaround, because using Float is bad news for memory usage and also disables DeepSpeed ZeRO optimizations?

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:2
  • Comments:75 (74 by maintainers)

github_iconTop GitHub Comments

2reactions
gulnazakicommented, Dec 14, 2020

Ooof, I changed Adam eps from 1e-8 to 1e-4 and everything is fine now (at 500 steps). I will continue training and let you know if anything bad happens. I should have thought of it earlier.

2reactions
lucidrainscommented, Dec 13, 2020

hopefully the codebase is both pytorch amp and nvidia apex compatible now

Read more comments on GitHub >

github_iconTop Results From Across the Web

How To Fit a Bigger Model and Train It Faster - Hugging Face
This section gives brief ideas on how to make training faster and support bigger models. Later sections will expand, demonstrate and elucidate each...
Read more >
pytorch-fast-transformers - PyPI
Provide a library with fast transformer implementations. Navigation. Project description; Release history; Download files. Project links.
Read more >
Enabling Efficient Inference of Transformer Models at ... - arXiv
Although PP does not help with the aggregate memory bandwidth since each micro- batch traverses the full depth of the model in sequence...
Read more >
s9832-taking-advantage-of-mixed-precision-to ... - NVIDIA
No architecture changes required ... Sum of FP16 values whose ratio is >211 is just the larger value ... apex.amp supports different optimization...
Read more >
Performer - Pytorch
No fp16 support from fast-transformers (CausalDotProduct) ... expected scalar type Float but found Half when using fp16 (for causal attention).
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found