question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

unclear `prepare_seq2seq_batch` deprecation

See original GitHub issue

When using prepare_seq2seq_batch the user now gets:

transformers-master/src/transformers/tokenization_utils_base.py:3277: FutureWarning: prepare_seq2seq_batch is deprecated and will be removed in version 5 of 🤗 Transformers. Use the regular __call__ method to prepare your inputs and the tokenizer under the with_target_tokenizer context manager to prepare your targets. See the documentation of your specific tokenizer for more details.

It’s very hard to act on as, I’m not sure what “regular __call__ method” refers to and I could find any tokenizer documentation that ever mentions with_target_tokenizer.

Perhaps this is an unintended typo? was it meant to be with target_tokenizer? with FooTokenizer?

Please kindly suggest a more user-friendly deprecation and at least one example or a link to such.

Thank you.

@sgugger, @LysandreJik

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

7reactions
padmalcomcommented, Sep 14, 2021

I stumbled upon this issue when googling the warning. For the translation task this tokenized_text = tokenizer.prepare_seq2seq_batch([text], return_tensors='pt') has to be replaced by this:

with tokenizer.as_target_tokenizer():
    tokenized_text = tokenizer(text, return_tensors='pt')

Which is much clearer than using prepare_seq2seq_batch, but for anyone coming from other languages but python, the concept of __call__ might not be transparent in first place 😃

1reaction
sguggercommented, Jul 12, 2021

Why is __call__ hard to understand? It’s the regular Python method for when the tokenizer is called directly on inputs. How would you formulate that better?

For the with_target_tokenizer it’s a typo indeed, it should be as_target_tokenizer.

As for an example, this is what is used in every example script, see for instance the run_translation script.

I’m curious, where did you still find a reference to this method? It’s been removed from all examples and documentation normally (and has been deprecated five months ago).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Sequence-to-Sequence Algorithm - Amazon SageMaker
Sequence to Sequence (seq2seq) is a supervised learning algorithm that uses Recurrent Neural ... For batch transform, inference supports JSON Lines format.
Read more >
Trainer - Hugging Face
We're on a journey to advance and democratize artificial intelligence through open source and open science.
Read more >
Tensorflow seq2seq tutorial: NoneType object has no attribute ...
... here the model are you trying to implement is deprecated. If you want to make it working check the code I've pasted...
Read more >
Chatbot Development using Deep Learning & NLP ...
Chatbot Development using Deep Learning & NLP implementing Seq2Seq Model ... If the query is vague, ask follow-up questions to get more insight...
Read more >
Implementing Batching for Seq2Seq Models in Pytorch
In sequence to sequence models batching means simultaneously encoding the inputs and processing them using our neural network either RNN, LSTM ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found