question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItΒ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support for XGLM: How to achieve faster inference speed?

See original GitHub issue

Describe a requested feature

Thanks for releasing this great library! I am currently working on deploying facebook/xglm-7.5B, which is currently not supported by parallelformers.

POLICY.md provides a comprehensive guide for parallelizing my own models. But I am a little bit unsure of

  1. which weights to be parallelized and
  2. how many GPUs should be used

for a better inference speed.

Architecture of XGLM-7.5B

root
β”œβ”€β”€ model (XGLMModel)
β”‚   β”œβ”€β”€ embed_tokens (Embedding) weight:[256008, 4096]
β”‚   β”œβ”€β”€ embed_positions (XGLMSinusoidalPositionalEmbedding) weights:[2050, 4096]
β”‚   β”œβ”€β”€ layers (ModuleList)
β”‚   β”‚   └── 0-31(XGLMDecoderLayer)
β”‚   β”‚       β”œβ”€β”€ self_attn (XGLMAttention)
β”‚   β”‚       β”‚   └── k_proj,v_proj,q_proj,out_proj(Linear) weight:[4096, 4096] bias:[4096]
β”‚   β”‚       β”œβ”€β”€ self_attn_layer_norm,final_layer_norm(LayerNorm) weight:[4096] bias:[4096]
β”‚   β”‚       β”œβ”€β”€ fc1 (Linear) weight:[16384, 4096] bias:[16384]
β”‚   β”‚       └── fc2 (Linear) weight:[4096, 16384] bias:[4096]
β”‚   └── layer_norm (LayerNorm) weight:[4096] bias:[4096]
└── lm_head (Linear) weight:[256008, 4096]

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

4reactions
hyunwoongkocommented, Apr 7, 2022

@z-bookworm I’ll add now

1reaction
un-certaintycommented, Apr 24, 2022

Hi @hyunwoongko May I have an update on this feature?

Read more comments on GitHub >

github_iconTop Results From Across the Web

XGLM - Hugging Face
Construct a β€œfast” XGLM tokenizer (backed by HuggingFace's tokenizers library). Adapted from RobertaTokenizer and XLNetTokenizer. Based on BPE.
Read more >
Improving Inference Speeds of Transformer Models - Medium
Improving Inference Speeds of Transformer Models ... β€œWith great models comes slower inference speeds”. Deep Learning has evolved immensely and itΒ ...
Read more >
5 Practical Ways to Speed Up your Deep Learning Model
In this blog, we've described five approaches to improve the inference time of your deep learning model. In particular, we'd advise you toΒ ......
Read more >
Issues Β· tunib-ai/parallelformers - GitHub
RuntimeError: CUDA error: peer access is not supported between these two ... Support for XGLM: How to achieve faster inference speed? enhancement NewΒ ......
Read more >
arXiv:2112.10668v3 [cs.CL] 10 Nov 2022
passes XGLM 7.5B in machine translation on sev- ... candidates are randomly sampled to ensure fast inference and save API cost.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found