Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Refactor Pytorch `model.generate` method to work on TPU

See original GitHub issue

Feature request

Refactor PT version of the method model.generate for text generating models to make it compatible with XLA and speed up inference on TPU.

Motivation

Right now, model.generate on PT is extremely slow on TPU compared to CPU and GPU. This is probably due to the fact that some operations done in the PT version of model.generate are not XLA compatible, and thus the generation process falls back on CPU. This makes inference on TPU infeasible. A major refactoring work has already been done on its TF counterpart, so it would be nice to have the PT version working as well.

A more in-depth discussion with @gante took place in #12322 and on this huggingface discussion.

Your contribution

If there is some interest from the HF team, I can definitely assist during the work.

Issue Analytics

State:
Created a year ago
Reactions:4
Comments:9 (5 by maintainers)

Top GitHub Comments

4reactions

gantecommented, Sep 28, 2022

Added to my generate task queue 👍

@divyanshuaggarwal it would be part of transformers!

0reactions

sguggercommented, Dec 12, 2022

This is not a prioritized feature as you can already use TPUs for generation in Flax and TensorFlow. Since you can easily convert a model from one framework to the other, there is an easy workaround 😃