Feature request: Add built-in support for autorregressive text generation with ONNX models
See original GitHub issue🚀 Add built-in support for autorregressive text generation with ONNX models.
After converting a autorregressive model to ONNX, it would be nice to be able to generate text with it via something like:
from transformers import OnnxTextGenerationModel, AutoTokenizer
model_path = "gpt-something.onnx"
tokenizer_name = "gpt2"
model = OnnxTextGenerationModel(model_path)
# and then
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='tf')
output = model.generate(encoded_input)
With support to using past_key_values
internally in the most efficient way.
Motivation
When trying to accelerate inference with transformers, being unable to load our ONNX model with the lib and running a model.generate
method to seamlessly generate sequences and perform Beam Search is somehow frustrating. That leads us to have to rely on custom implementations - which takes time and are a lot more prone to have bugs.
We can try to hack a subclass of GenerationMixin
, but having to convert things to and from PyTorch makes everything too slow.
Your contribution
I can try submitting a PR, but this will take long, as I work full-time and might not have enough time to make it fast.
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (8 by maintainers)
Yes, this is planned. Nice to know that there is interest for such features!
Pinging @lewisbails and @philschmid as they were the ones suggesting to add those kind of features to
optimum
.We are following this discussion on https://github.com/huggingface/optimum/issues/55 .