question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

return_tensors and return_text in TextGenerationPipeline don't work or partially work

See original GitHub issue

System Info

  • transformers version: 4.24.0
  • python version: 3.8.11

Who can help?

Library:

Documentation: @sgugger, @stevhliu

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

  1. initialize TextGenerationPipeline, assume we call it pipeline below
  2. run the following code snips:
results = pipeline(text_input, return_text=True, return_full_text=False, return_tensors=False)[0]
results = pipeline(text_input, return_text=True, return_full_text=False, return_tensors=True)[0]
results = pipeline(text_input, return_text=False, return_full_text=False, return_tensors=True)[0]
results = pipeline(text_input, return_text=False, return_full_text=False, return_tensors=False)[0]
  1. all the four code snips return the same dict with only one key generated_text

Expected behavior

  1. when return_text=True and return_tensors=False, return a dict contains only one key generated_text
  2. when return_text=False and return_tensors=True, return a dict contains only one key generated_token_ids
  3. when return_text=True and return_tensors=True, return a dict contains both generated_text and generated_token_ids

Issue Analytics

  • State:closed
  • Created 9 months ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
Narsilcommented, Dec 7, 2022

Yes the docs could use some polish here, maybe even soft deprecate return_text & co in favor of return_type. Soft deprecate meaning we don’t ever have to actually remove them, just don’t make them as prominent since they are indeed confusing.

1reaction
PanQiWeicommented, Dec 7, 2022

I may be wrong, but I think return_type is an internal parameter, but you can still decide what to return with the other three parameters.

As far as I can tell, you can’t return a combination of generated_text and generated_token_ids. You can only return one or the other, which I guess is why some of those combinations don’t do anything. Would it help if there was a note in the docs about this?

@stevhliu thanks for the replying! Now I’m clear with the functionality and relationship between return_text and return_tensors, and I think it would be clear to more people if the documentation also target this out. 😄

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pipelines - Hugging Face
This pipeline only works for inputs with exactly one token masked. Experimental: We added support for multiple masks. The returned values are raw...
Read more >
Text Generation with HuggingFace - GPT2 | Kaggle
This project is a work in progress and I will continually update it as I learn more ... GPT2Tokenizer #get large GPT2 tokenizer...
Read more >
Mudit Rustagi, Author at Analytics India Magazine
Ideally, we would want to collect every possible review and work with that. However, in the real world, data is often limited.
Read more >
Hugging Face Transformers Pipeline Functions | Advanced NLP
Advanced Deep Learning NLP. This article was published as a part of the Data Science Blogathon. ... Let's build a text generation pipeline....
Read more >
v1.0.5 PDF - BentoML
Integrations & Ecosystem Learn how BentoML works together with other tools ... responsible for building and training the model, often don't.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found