No fp16 support from fast-transformers (CausalDotProduct)
See original GitHub issueThere is the following RuntimeError: expected scalar type Float but found Half
when using fp16 (for causal attention).
That is because the fast-transformers’ CausalDotProduct
doesn’t support fp16. Do you think there is any workaround, because using Float is bad news for memory usage and also disables DeepSpeed ZeRO optimizations?
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:75 (74 by maintainers)
Top Results From Across the Web
How To Fit a Bigger Model and Train It Faster - Hugging Face
This section gives brief ideas on how to make training faster and support bigger models. Later sections will expand, demonstrate and elucidate each...
Read more >pytorch-fast-transformers - PyPI
Provide a library with fast transformer implementations. Navigation. Project description; Release history; Download files. Project links.
Read more >Enabling Efficient Inference of Transformer Models at ... - arXiv
Although PP does not help with the aggregate memory bandwidth since each micro- batch traverses the full depth of the model in sequence...
Read more >s9832-taking-advantage-of-mixed-precision-to ... - NVIDIA
No architecture changes required ... Sum of FP16 values whose ratio is >211 is just the larger value ... apex.amp supports different optimization...
Read more >Performer - Pytorch
No fp16 support from fast-transformers (CausalDotProduct) ... expected scalar type Float but found Half when using fp16 (for causal attention).
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Ooof, I changed Adam eps from 1e-8 to 1e-4 and everything is fine now (at 500 steps). I will continue training and let you know if anything bad happens. I should have thought of it earlier.
hopefully the codebase is both pytorch amp and nvidia apex compatible now