Generate text with `model.generate` on TPU does not work
See original GitHub issueEnvironment info
transformers
version: 4.7.0- Platform: Linux-5.4.0-1043-gcp-x86_64-with-glibc2.29 (Ubuntu 20.04.2 LTS)
- Python version: 3.8.5
- PyTorch version (GPU?): 1.8.1+cu102 (False)
- PyTorch XLA version: 1.8.1
- Using GPU in script?: No, using TPU
- Using distributed or parallel set-up in script?: No, using a single TPU core
Who can help
Information
Model I am using (Bert, XLNet …): facebook/m2m100_1.2B
, but other text generating models have the same problem.
The problem arises when using:
- the official example scripts: (give details below)
- my own modified scripts: (give details below)
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name)
- my own task or dataset: (give details below)
To reproduce
On a machine with a TPU run:
import torch_xla.core.xla_model as xm
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
model_name = 'facebook/m2m100_1.2B'
source_lang = 'en'
target_lang = 'de'
docs = [
"This is some document to translate.",
"And another document to translate."
]
device = xm.xla_device()
model = M2M100ForConditionalGeneration.from_pretrained(model_name).to(device)
tokenizer = M2M100Tokenizer.from_pretrained(model_name, src_lang=source_lang)
encoded_docs = tokenizer(docs, return_tensors='pt', padding=True).to(device)
generated_tokens = model.generate(**encoded_docs, forced_bos_token_id=tokenizer.get_lang_id(target_lang))
The call to model.generate()
runs without ever terminating. It seems to be stuck somewhere in the beam search.
The same code runs perfectly fine on CPUs and GPUs.
Expected behavior
I’d expect that the generation of text works in the same way as for CPUs and GPUs.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:13 (7 by maintainers)
Top Results From Across the Web
Faster Text Generation with TensorFlow and XLA
TL;DR: Text Generation on transformers using TensorFlow can now be compiled with XLA. It is up to 100x faster than before, and even ......
Read more >Troubleshooting TensorFlow - TPU - Google Cloud
This guide, along with the FAQ, provides troubleshooting help for users who are training TensorFlow models on Cloud TPU. If you are troubleshooting...
Read more >Solve GLUE tasks using BERT on TPU | Text - TensorFlow
This tutorial demonstrates how to do preprocessing as part of your input pipeline for training, using Dataset.map, and then merge it into the...
Read more >Tensor Processing Units (TPUs) for Accelerated Machine ...
A Tensor Processing Unit (TPU) is a custom computer chip designed by Google ... The trained model can generate new snippets of text...
Read more >Machine Learning Glossary - Google Developers
The encoder's job is to produce good text representations, ... Not to be confused with the bias term in machine learning models or ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@mikcnt @divyanshuaggarwal The previous TF generate function was almost a (reduced) copy of the current PT generate function. We had to do a major rework of the TF generate function to make it compatible with XLA, so yeah… PT needs the same treatment if we want to use it with XLA 😄
I’ve shared a twitter thread today about the subject: https://twitter.com/joao_gante/status/1555527603716444160
I had an exchange with @gante about it and it seems like the code will need major refactoring for this. https://huggingface.co/spaces/joaogante/tf_xla_generate_benchmarks/discussions/1#62eb9350985a691200cf2921