Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

return_tensors and return_text in TextGenerationPipeline don't work or partially work

See original GitHub issue

System Info

transformers version: 4.24.0
python version: 3.8.11

Who can help?

Library:

Text generation: @patrickvonplaten, @Narsil, @gante
Pipelines: @Narsil

Documentation: @sgugger, @stevhliu

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

initialize TextGenerationPipeline, assume we call it pipeline below
run the following code snips:

results = pipeline(text_input, return_text=True, return_full_text=False, return_tensors=False)[0]

results = pipeline(text_input, return_text=True, return_full_text=False, return_tensors=True)[0]

results = pipeline(text_input, return_text=False, return_full_text=False, return_tensors=True)[0]

results = pipeline(text_input, return_text=False, return_full_text=False, return_tensors=False)[0]

all the four code snips return the same dict with only one key generated_text

Expected behavior

when return_text=True and return_tensors=False, return a dict contains only one key generated_text
when return_text=False and return_tensors=True, return a dict contains only one key generated_token_ids
when return_text=True and return_tensors=True, return a dict contains both generated_text and generated_token_ids

Issue Analytics

State:
Created 9 months ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

Narsilcommented, Dec 7, 2022

Yes the docs could use some polish here, maybe even soft deprecate return_text & co in favor of return_type. Soft deprecate meaning we don’t ever have to actually remove them, just don’t make them as prominent since they are indeed confusing.

1reaction

PanQiWeicommented, Dec 7, 2022

I may be wrong, but I think return_type is an internal parameter, but you can still decide what to return with the other three parameters.

As far as I can tell, you can’t return a combination of generated_text and generated_token_ids. You can only return one or the other, which I guess is why some of those combinations don’t do anything. Would it help if there was a note in the docs about this?

@stevhliu thanks for the replying! Now I’m clear with the functionality and relationship between return_text and return_tensors, and I think it would be clear to more people if the documentation also target this out. 😄