question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MarianMT: doubling batch_size has no effect on time taken

See original GitHub issue

Environment info

  • transformers version: 4.15.0
  • Platform: Linux-5.4.0-97-generic-x86_64-with-debian-bullseye-sid
  • Python version: 3.6.13
  • PyTorch version (GPU?): 1.10.1 (True)
  • Tensorflow version (GPU?): 2.6.2 (False)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Who can help

@patrickvonplaten

Information

Model I am using (Bert, XLNet …): MarianMTModel, MarianTokenizer

The problem arises when using:

  • the official example scripts: (give details below)
  • [Y] my own modified scripts: (give details below)

Problem: Doubling the batch size does not improve the time taken to do backtranslation with MarianMTModel at all.

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • [Y] my own task or dataset: (give details below)

I am trying to do back translation for a given set of tweets (my own collection) using MarianMTModel and MarianTokenizer. With batch_size=128, it shows 1 hour with 4 s/it on A6000 GPU, whereas with batch_size=256, it still shows 1 hour with 7.7-8 s/it. FYI, I do see the steps halved using tqdm bar.

To reproduce

Steps to reproduce the behavior:

I am using the below script:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# English to Romance languages
target_model_name = 'Helsinki-NLP/opus-mt-en-ROMANCE'
target_tokenizer = MarianTokenizer.from_pretrained(target_model_name)
target_model = MarianMTModel.from_pretrained(target_model_name).cuda()

# Romance languages to English
en_model_name = 'Helsinki-NLP/opus-mt-ROMANCE-en'
en_tokenizer = MarianTokenizer.from_pretrained(en_model_name)
en_model = MarianMTModel.from_pretrained(en_model_name).cuda()


def translate(texts, model, tokenizer, language="fr", num_beams=1):
    template = lambda text: f"{text}" if language == "en" else f">>{language}<< {text}"
    src_texts = [template(text) for text in texts]
    encoded = tokenizer.prepare_seq2seq_batch(src_texts, return_tensors='pt').to(device)
    translated = model.generate(**encoded, do_sample=True, max_length=256, top_k=0, num_beams=1, temperature=0.7)
    translated_texts = tokenizer.batch_decode(translated, skip_special_tokens=True)
    return translated_texts


def back_translate(texts, source_lang="en", target_lang="fr", num_beams=1):
    fr_texts = translate(texts, target_model, target_tokenizer, language=target_lang, num_beams=num_beams)
    back_translated_texts = translate(fr_texts, en_model, en_tokenizer, language=source_lang, num_beams=num_beams)
    return back_translated_texts


batch_size = 256 # 128
for i in tqdm(range(0, len(df1), batch_size)):
    rows = df1[i:i+batch_size]
    aug_text = back_translate(rows['Tweet'].tolist(), source_lang="en", target_lang="fr",num_beams=1)

Expected behavior

With double the batch_size, time should reduce to half.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
patil-surajcommented, Apr 11, 2022

From the code-snippet, it looks like the GPU is under-utilised here. Because while the text is being encoded, the GPU is just sitting idle. Maybe try to use a Dataset/DataLoader here to prepare the batches in the background while the model is translating other batches, so you can take advantage of async execution.

you could also try pipeline batching: https://huggingface.co/docs/transformers/main_classes/pipelines#pipeline-batching

0reactions
github-actions[bot]commented, May 5, 2022

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What is the trade-off between batch size and number of ...
IME smaller batches lead to longer training times. Often much longer because on modern hw a batch of size 32, 64 or 128...
Read more >
Effect of batch size on training dynamics | by Kevin Shen
It has been empirically observed that smaller batch sizes not only has faster training dynamics but also generalization to the test dataset ...
Read more >
Will larger batch size make computation time less in machine ...
They are saying that they got 92% accuracy. After two or three epoch they have got above 40% accuracy.
Read more >
How to get 4x speedup and better generalization using the ...
With a correct batch size, training can be 4 time faster while still ... if the gradient is not noisy, we will benefit...
Read more >
What Is the Effect of Batch Size on Model Learning?
It might be one of the most important measures in ensuring that your models perform at their best. It should come as no...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found