Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Generate text with `model.generate` on TPU does not work

See original GitHub issue

Environment info

transformers version: 4.7.0
Platform: Linux-5.4.0-1043-gcp-x86_64-with-glibc2.29 (Ubuntu 20.04.2 LTS)
Python version: 3.8.5
PyTorch version (GPU?): 1.8.1+cu102 (False)
PyTorch XLA version: 1.8.1
Using GPU in script?: No, using TPU
Using distributed or parallel set-up in script?: No, using a single TPU core

Who can help

@patrickvonplaten

Information

Model I am using (Bert, XLNet …): facebook/m2m100_1.2B, but other text generating models have the same problem.

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

To reproduce

On a machine with a TPU run:

import torch_xla.core.xla_model as xm
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer

model_name = 'facebook/m2m100_1.2B'
source_lang = 'en'
target_lang = 'de'

docs = [
    "This is some document to translate.",
    "And another document to translate."
]

device = xm.xla_device()

model = M2M100ForConditionalGeneration.from_pretrained(model_name).to(device)

tokenizer = M2M100Tokenizer.from_pretrained(model_name, src_lang=source_lang)
encoded_docs = tokenizer(docs, return_tensors='pt', padding=True).to(device)

generated_tokens = model.generate(**encoded_docs, forced_bos_token_id=tokenizer.get_lang_id(target_lang))

The call to model.generate() runs without ever terminating. It seems to be stuck somewhere in the beam search.

The same code runs perfectly fine on CPUs and GPUs.

Expected behavior

I’d expect that the generation of text works in the same way as for CPUs and GPUs.

Issue Analytics

State:
Created 2 years ago
Reactions:2
Comments:13 (7 by maintainers)

Top GitHub Comments

2reactions

gantecommented, Aug 5, 2022

@mikcnt @divyanshuaggarwal The previous TF generate function was almost a (reduced) copy of the current PT generate function. We had to do a major rework of the TF generate function to make it compatible with XLA, so yeah… PT needs the same treatment if we want to use it with XLA 😄

I’ve shared a twitter thread today about the subject: https://twitter.com/joao_gante/status/1555527603716444160

2reactions

divyanshuaggarwalcommented, Aug 4, 2022

Is there any update on this?

I had an exchange with @gante about it and it seems like the code will need major refactoring for this. https://huggingface.co/spaces/joaogante/tf_xla_generate_benchmarks/discussions/1#62eb9350985a691200cf2921

Top Results From Across the Web

Faster Text Generation with TensorFlow and XLA

TL;DR: Text Generation on transformers using TensorFlow can now be compiled with XLA. It is up to 100x faster than before, and even ......

Troubleshooting TensorFlow - TPU - Google Cloud

This guide, along with the FAQ, provides troubleshooting help for users who are training TensorFlow models on Cloud TPU. If you are troubleshooting...

Solve GLUE tasks using BERT on TPU | Text - TensorFlow

This tutorial demonstrates how to do preprocessing as part of your input pipeline for training, using Dataset.map, and then merge it into the...

Tensor Processing Units (TPUs) for Accelerated Machine ...

A Tensor Processing Unit (TPU) is a custom computer chip designed by Google ... The trained model can generate new snippets of text...

Machine Learning Glossary - Google Developers

The encoder's job is to produce good text representations, ... Not to be confused with the bias term in machine learning models or ......