Support for XGLM: How to achieve faster inference speed?
See original GitHub issueDescribe a requested feature
Thanks for releasing this great library! I am currently working on deploying facebook/xglm-7.5B, which is currently not supported by parallelformers.
POLICY.md provides a comprehensive guide for parallelizing my own models. But I am a little bit unsure of
- which weights to be parallelized and
- how many GPUs should be used
for a better inference speed.
Architecture of XGLM-7.5B
root
βββ model (XGLMModel)
β βββ embed_tokens (Embedding) weight:[256008, 4096]
β βββ embed_positions (XGLMSinusoidalPositionalEmbedding) weights:[2050, 4096]
β βββ layers (ModuleList)
β β βββ 0-31(XGLMDecoderLayer)
β β βββ self_attn (XGLMAttention)
β β β βββ k_proj,v_proj,q_proj,out_proj(Linear) weight:[4096, 4096] bias:[4096]
β β βββ self_attn_layer_norm,final_layer_norm(LayerNorm) weight:[4096] bias:[4096]
β β βββ fc1 (Linear) weight:[16384, 4096] bias:[16384]
β β βββ fc2 (Linear) weight:[4096, 16384] bias:[4096]
β βββ layer_norm (LayerNorm) weight:[4096] bias:[4096]
βββ lm_head (Linear) weight:[256008, 4096]
Issue Analytics
- State:
- Created a year ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
XGLM - Hugging Face
Construct a βfastβ XGLM tokenizer (backed by HuggingFace's tokenizers library). Adapted from RobertaTokenizer and XLNetTokenizer. Based on BPE.
Read more >Improving Inference Speeds of Transformer Models - Medium
Improving Inference Speeds of Transformer Models ... βWith great models comes slower inference speedsβ. Deep Learning has evolved immensely and itΒ ...
Read more >5 Practical Ways to Speed Up your Deep Learning Model
In this blog, we've described five approaches to improve the inference time of your deep learning model. In particular, we'd advise you toΒ ......
Read more >Issues Β· tunib-ai/parallelformers - GitHub
RuntimeError: CUDA error: peer access is not supported between these two ... Support for XGLM: How to achieve faster inference speed? enhancement NewΒ ......
Read more >arXiv:2112.10668v3 [cs.CL] 10 Nov 2022
passes XGLM 7.5B in machine translation on sev- ... candidates are randomly sampled to ensure fast inference and save API cost.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@z-bookworm Iβll add now
Hi @hyunwoongko May I have an update on this feature?