BartTokenizerFast cannot decode PyTorch tensors
See original GitHub issueEnvironment info
transformers
version: 3.0.2- Platform: MacOS and Linux
- Python version: 3.6 and 3.7
- PyTorch version (GPU?): 1.6.0 (no and yes)
- Tensorflow version (GPU?): N/A
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Who can help
examples/seq2seq: @sshleifer (Discovered in #6610.)
Information
Model I am using (Bert, XLNet …): Bart.
Any Bart model (reproduced with distilbart-cnn-12-6 and distilbart-xsum-1-1.
To reproduce
Steps to reproduce the behavior:
In [1]: from transformers import BartTokenizerFast, BartForConditionalGeneration
In [2]: model = BartForConditionalGeneration.from_pretrained("sshleifer/distilbart-xsum-1-1")
In [3]: tokenizer = BartTokenizerFast.from_pretrained("sshleifer/distilbart-xsum-1-1")
In [4]: input_ids = tokenizer("This is a test string.", return_tensors="pt")
In [5]: input_ids
Out[5]: {'input_ids': tensor([[ 0, 713, 16, 10, 1296, 6755, 4, 2]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1]])}
In [6]: summary_ids = model.generate(input_ids['input_ids'], num_beams=4, max_length=5, early_stopping=True)
In [7]: print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-7-d476aca57720> in <module>
----> 1 print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])
<ipython-input-7-d476aca57720> in <listcomp>(.0)
----> 1 print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])
~/.pyenv/versions/finetuning-bart/lib/python3.7/site-packages/transformers/tokenization_utils_fast.py in decode(self, token_ids, skip_special_tokens, clean_up_tokenization_spaces)
437 self, token_ids: List[int], skip_special_tokens: bool = False, clean_up_tokenization_spaces: bool = True
438 ) -> str:
--> 439 text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
440
441 if clean_up_tokenization_spaces:
~/.pyenv/versions/finetuning-bart/lib/python3.7/site-packages/tokenizers/implementations/base_tokenizer.py in decode(self, ids, skip_special_tokens)
265 raise ValueError("None input is not valid. Should be a list of integers.")
266
--> 267 return self._tokenizer.decode(ids, skip_special_tokens=skip_special_tokens)
268
269 def decode_batch(
TypeError:
In [8]:
Expected behavior
Fast tokenizer should be able to decode without producing an error.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
BART - Hugging Face
Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding special tokens using the...
Read more >Error in Transformer encoder/decoder? RuntimeError
When using pytorch_lightning I should'nt have to 'manually' send tensors to the device, so i'm not sure if thats going to work. I'm...
Read more >Recovering token ids from normalized input? - PyTorch Forums
... completes the task but I cannot reproduce any of the inputs as the token ids are normalized so tokenizer.decode() does not work....
Read more >Transformer — PyTorch 1.13 documentation
activation (Union[str, Callable[[Tensor], Tensor]]) – the activation function of encoder/decoder intermediate layer, can be a string (“relu” or “gelu”) or a ...
Read more >PyTorch-Transformers
... inputs to PyTorch tensors segments_tensors = torch.tensor([segments_ids]) ... answer = question_answering_tokenizer.decode(indexed_tokens[torch.argmax( ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Faster snippet w same error
The above @setu4993 @LysandreJik seems to give same code twice. You have to pass in a list of integers into the decode function. Official doc says decode function can process Torch.tensors, but it does not work well in all cases. Instead, give this a try
If tensor is [[…]], instead of […], do