Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

We made a toolkit can parallelize almost all the Hugging Face models. But we have some question !

See original GitHub issue

We recently developed an opensource called parallelformers, (https://github.com/tunib-ai/parallelformers) and have a few questions, so we write an issue here.

Q. As a logo, an image homage to the hugging face was used. Not exactly the same CI, but from Unicode. Will it be a problem?

Q. What do you think about collaboration? We can include model parallelization for all models in hugging face transformers.

The following is what I posted on Reddit to promote our opensource.

Hello, I am writing to inform you about the release of Parallelformers (https://github.com/tunib-ai/parallelformers), a model parallelization library at TUNiB. Parallelformers is a toolkit that supports inference parallelism for 68 models in Huggingface Transformers with 1 line of code.

Previously, DeepSpeed-Inference was used as a parallelization toolkit for model inference.

(1) It was impossible to deploy to the web server due to the process flow,

(2) Lack of integration with Huggingface Transformers, which has now become the de facto standard for natural language processing tools. (DeepSpeed-Inference only supports 3 models)

(3) Also, since parallelization starts in the GPU state, there was a problem that all parameters of the model had to be put on the GPU before parallelization.

Parallelformers solved a number of problems in DeepSpeed-Inference. Using this toolkit internally, we were able to easily deploy a large model to our web server, reducing the cost of deployment by up to 3-5x. More detailed information and source code can be found on GitHub. Thanks !

Issue Analytics

State:
Created 2 years ago
Reactions:2
Comments:29 (29 by maintainers)

Top GitHub Comments

3reactions

LysandreJikcommented, Jul 19, 2021

Hello @hyunwoongko, thanks a lot for sharing, this is a really cool project! No problem at all regarding the image homage (really cool logo by the way!)

I’m pinging @stas00 who has led the efforts of model parallelization and DeepSpeed integration on our side and would probably be interested. Also pinging @sgugger as he has done some similar work.

2reactions

hyunwoongkocommented, Aug 21, 2021

@stas00 Sorry for the delay this work. We are also making a public large-scale model that can cover Asian languages. I’ve been very busy these days, so I haven’t had much time to contribute to Hugging Face. I will work on it as soon as possible.

Top Results From Across the Web

Model Parallelism - Hugging Face

We will first discuss in depth various 1D parallelism techniques and their pros and cons and then look at how they can be...

Model Parallelism, how to parallelize transformer? - Beginners

I found out some models as T5, GPT2 have parallelize() method to split encoder and decoder on different devices. But that has serious...

Transformers - Hugging Face

Transformers provides APIs and tools to easily download and train state-of-the-art pretrained models. Using pretrained models can reduce your compute costs, ...

Efficient Training on a Single GPU - Hugging Face

Efficient Training on a Single GPU. This guide focuses on training large models efficiently on a single GPU. These approaches are still valid...

Trainer - Hugging Face

The Trainer class is optimized for Transformers models and can have surprising behaviors when you use it on other models. When using it...