Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MarianMT: doubling batch_size has no effect on time taken

See original GitHub issue

Environment info

transformers version: 4.15.0
Platform: Linux-5.4.0-97-generic-x86_64-with-debian-bullseye-sid
Python version: 3.6.13
PyTorch version (GPU?): 1.10.1 (True)
Tensorflow version (GPU?): 2.6.2 (False)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

Who can help

@patrickvonplaten

Information

Model I am using (Bert, XLNet …): MarianMTModel, MarianTokenizer

The problem arises when using:

the official example scripts: (give details below)
[Y] my own modified scripts: (give details below)

Problem: Doubling the batch size does not improve the time taken to do backtranslation with MarianMTModel at all.

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
[Y] my own task or dataset: (give details below)

I am trying to do back translation for a given set of tweets (my own collection) using MarianMTModel and MarianTokenizer. With batch_size=128, it shows 1 hour with 4 s/it on A6000 GPU, whereas with batch_size=256, it still shows 1 hour with 7.7-8 s/it. FYI, I do see the steps halved using tqdm bar.

To reproduce

Steps to reproduce the behavior:

I am using the below script:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# English to Romance languages
target_model_name = 'Helsinki-NLP/opus-mt-en-ROMANCE'
target_tokenizer = MarianTokenizer.from_pretrained(target_model_name)
target_model = MarianMTModel.from_pretrained(target_model_name).cuda()

# Romance languages to English
en_model_name = 'Helsinki-NLP/opus-mt-ROMANCE-en'
en_tokenizer = MarianTokenizer.from_pretrained(en_model_name)
en_model = MarianMTModel.from_pretrained(en_model_name).cuda()


def translate(texts, model, tokenizer, language="fr", num_beams=1):
    template = lambda text: f"{text}" if language == "en" else f">>{language}<< {text}"
    src_texts = [template(text) for text in texts]
    encoded = tokenizer.prepare_seq2seq_batch(src_texts, return_tensors='pt').to(device)
    translated = model.generate(**encoded, do_sample=True, max_length=256, top_k=0, num_beams=1, temperature=0.7)
    translated_texts = tokenizer.batch_decode(translated, skip_special_tokens=True)
    return translated_texts


def back_translate(texts, source_lang="en", target_lang="fr", num_beams=1):
    fr_texts = translate(texts, target_model, target_tokenizer, language=target_lang, num_beams=num_beams)
    back_translated_texts = translate(fr_texts, en_model, en_tokenizer, language=source_lang, num_beams=num_beams)
    return back_translated_texts


batch_size = 256 # 128
for i in tqdm(range(0, len(df1), batch_size)):
    rows = df1[i:i+batch_size]
    aug_text = back_translate(rows['Tweet'].tolist(), source_lang="en", target_lang="fr",num_beams=1)

Expected behavior

With double the batch_size, time should reduce to half.

Issue Analytics

State:
Created a year ago
Comments:7 (3 by maintainers)

Top GitHub Comments

1reaction

patil-surajcommented, Apr 11, 2022

From the code-snippet, it looks like the GPU is under-utilised here. Because while the text is being encoded, the GPU is just sitting idle. Maybe try to use a Dataset/DataLoader here to prepare the batches in the background while the model is translating other batches, so you can take advantage of async execution.

you could also try pipeline batching: https://huggingface.co/docs/transformers/main_classes/pipelines#pipeline-batching

0reactions

github-actions[bot]commented, May 5, 2022

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.