Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to do inferencing using multiple GPU's for styleformer

See original GitHub issue

I am using this model to do inferencing on 1 million data point using A100 GPU's with 4 GPU. I am launching a inference.py code using Googles vertex-ai Container.

How can I make inference code to utilise all 4 GPU’s ? So that inferencing is super-fast.

Here is the same code I use in inference.py:

from styleformer import Styleformer
import warnings
warnings.filterwarnings("ignore")

# style = [0=Casual to Formal, 1=Formal to Casual, 2=Active to Passive, 3=Passive to Active etc..]
sf = Styleformer(style = 1) 
import torch
def set_seed(seed):
  torch.manual_seed(seed)
  if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

set_seed(1212)

source_sentences = [
"I would love to meet attractive men in town",
"Please leave the room now",
"It is a delicious icecream",
"I am not paying this kind of money for that nonsense",
"He is on cocaine and he cannot be trusted with this",
"He is a very nice man and has a charming personality",
"Let us go out for dinner",
"We went to Barcelona for the weekend. We have a lot of things to tell you.",
]   

for source_sentence in source_sentences:
    # inference_on = [0=Regular model On CPU, 1= Regular model On GPU, 2=Quantized model On CPU]
    target_sentence = sf.transfer(source_sentence, inference_on=1, quality_filter=0.95, max_candidates=5)
    print("[Formal] ", source_sentence)
    if target_sentence is not None:
        print("[Casual] ",target_sentence)
    else:
        print("No good quality transfers available !")
    print("-" *100)

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

mhillebrandcommented, Apr 25, 2022

@pratikchhapolika It sounds like you’ll need to fire up a separate process for each GPU and pass in inference_on=0, inference_on=1, inference_on=2, and inference_on=3, respectively, using multiprocessing.

@PrithivirajDamodaran What I would like to know is how one can batchify Styleformer inference tasks to make efficient use of GPUs that have 48GB or 80GB each.

0reactions

mhillebrandcommented, Sep 3, 2022

@pratikchhapolika It sounds like you’ll need to fire up a separate process for each GPU and pass in inference_on=0, inference_on=1, inference_on=2, and inference_on=3, respectively, using multiprocessing. @PrithivirajDamodaran What I would like to know is how one can batchify Styleformer inference tasks to make efficient use of GPUs that have 48GB or 80GB each.

Yes, it can be batched. Will add that patch now.

@PrithivirajDamodaran How’s the batch patch coming along?