MarianMT: doubling batch_size has no effect on time taken
See original GitHub issueEnvironment info
transformers
version: 4.15.0- Platform: Linux-5.4.0-97-generic-x86_64-with-debian-bullseye-sid
- Python version: 3.6.13
- PyTorch version (GPU?): 1.10.1 (True)
- Tensorflow version (GPU?): 2.6.2 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
Who can help
Information
Model I am using (Bert, XLNet …): MarianMTModel, MarianTokenizer
The problem arises when using:
- the official example scripts: (give details below)
- [Y] my own modified scripts: (give details below)
Problem: Doubling the batch size does not improve the time taken to do backtranslation with MarianMTModel at all.
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name)
- [Y] my own task or dataset: (give details below)
I am trying to do back translation for a given set of tweets (my own collection) using MarianMTModel and MarianTokenizer. With batch_size=128
, it shows 1 hour with 4 s/it on A6000 GPU, whereas with batch_size=256
, it still shows 1 hour with 7.7-8 s/it. FYI, I do see the steps halved using tqdm
bar.
To reproduce
Steps to reproduce the behavior:
I am using the below script:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# English to Romance languages
target_model_name = 'Helsinki-NLP/opus-mt-en-ROMANCE'
target_tokenizer = MarianTokenizer.from_pretrained(target_model_name)
target_model = MarianMTModel.from_pretrained(target_model_name).cuda()
# Romance languages to English
en_model_name = 'Helsinki-NLP/opus-mt-ROMANCE-en'
en_tokenizer = MarianTokenizer.from_pretrained(en_model_name)
en_model = MarianMTModel.from_pretrained(en_model_name).cuda()
def translate(texts, model, tokenizer, language="fr", num_beams=1):
template = lambda text: f"{text}" if language == "en" else f">>{language}<< {text}"
src_texts = [template(text) for text in texts]
encoded = tokenizer.prepare_seq2seq_batch(src_texts, return_tensors='pt').to(device)
translated = model.generate(**encoded, do_sample=True, max_length=256, top_k=0, num_beams=1, temperature=0.7)
translated_texts = tokenizer.batch_decode(translated, skip_special_tokens=True)
return translated_texts
def back_translate(texts, source_lang="en", target_lang="fr", num_beams=1):
fr_texts = translate(texts, target_model, target_tokenizer, language=target_lang, num_beams=num_beams)
back_translated_texts = translate(fr_texts, en_model, en_tokenizer, language=source_lang, num_beams=num_beams)
return back_translated_texts
batch_size = 256 # 128
for i in tqdm(range(0, len(df1), batch_size)):
rows = df1[i:i+batch_size]
aug_text = back_translate(rows['Tweet'].tolist(), source_lang="en", target_lang="fr",num_beams=1)
Expected behavior
With double the batch_size, time should reduce to half.
Issue Analytics
- State:
- Created a year ago
- Comments:7 (3 by maintainers)
Top Results From Across the Web
What is the trade-off between batch size and number of ...
IME smaller batches lead to longer training times. Often much longer because on modern hw a batch of size 32, 64 or 128...
Read more >Effect of batch size on training dynamics | by Kevin Shen
It has been empirically observed that smaller batch sizes not only has faster training dynamics but also generalization to the test dataset ...
Read more >Will larger batch size make computation time less in machine ...
They are saying that they got 92% accuracy. After two or three epoch they have got above 40% accuracy.
Read more >How to get 4x speedup and better generalization using the ...
With a correct batch size, training can be 4 time faster while still ... if the gradient is not noisy, we will benefit...
Read more >What Is the Effect of Batch Size on Model Learning?
It might be one of the most important measures in ensuring that your models perform at their best. It should come as no...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
From the code-snippet, it looks like the GPU is under-utilised here. Because while the text is being encoded, the GPU is just sitting idle. Maybe try to use a
Dataset/DataLoader
here to prepare the batches in the background while the model is translating other batches, so you can take advantage of async execution.you could also try
pipeline
batching: https://huggingface.co/docs/transformers/main_classes/pipelines#pipeline-batchingThis issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.