Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

unclear `prepare_seq2seq_batch` deprecation

See original GitHub issue

When using prepare_seq2seq_batch the user now gets:

transformers-master/src/transformers/tokenization_utils_base.py:3277: FutureWarning: prepare_seq2seq_batch is deprecated and will be removed in version 5 of 🤗 Transformers. Use the regular __call__ method to prepare your inputs and the tokenizer under the with_target_tokenizer context manager to prepare your targets. See the documentation of your specific tokenizer for more details.

It’s very hard to act on as, I’m not sure what “regular __call__ method” refers to and I could find any tokenizer documentation that ever mentions with_target_tokenizer.

Perhaps this is an unintended typo? was it meant to be with target_tokenizer? with FooTokenizer?

Please kindly suggest a more user-friendly deprecation and at least one example or a link to such.

Thank you.

@sgugger, @LysandreJik

Issue Analytics

State:
Created 2 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

7reactions

padmalcomcommented, Sep 14, 2021

I stumbled upon this issue when googling the warning. For the translation task this tokenized_text = tokenizer.prepare_seq2seq_batch([text], return_tensors='pt') has to be replaced by this:

with tokenizer.as_target_tokenizer():
    tokenized_text = tokenizer(text, return_tensors='pt')

Which is much clearer than using prepare_seq2seq_batch, but for anyone coming from other languages but python, the concept of __call__ might not be transparent in first place 😃

1reaction

sguggercommented, Jul 12, 2021

Why is __call__ hard to understand? It’s the regular Python method for when the tokenizer is called directly on inputs. How would you formulate that better?

For the with_target_tokenizer it’s a typo indeed, it should be as_target_tokenizer.

As for an example, this is what is used in every example script, see for instance the run_translation script.

I’m curious, where did you still find a reference to this method? It’s been removed from all examples and documentation normally (and has been deprecated five months ago).

Top Results From Across the Web

Sequence-to-Sequence Algorithm - Amazon SageMaker

Sequence to Sequence (seq2seq) is a supervised learning algorithm that uses Recurrent Neural ... For batch transform, inference supports JSON Lines format.

Trainer - Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

Tensorflow seq2seq tutorial: NoneType object has no attribute ...

... here the model are you trying to implement is deprecated. If you want to make it working check the code I've pasted...

Chatbot Development using Deep Learning & NLP ...

Chatbot Development using Deep Learning & NLP implementing Seq2Seq Model ... If the query is vague, ask follow-up questions to get more insight...

Implementing Batching for Seq2Seq Models in Pytorch

In sequence to sequence models batching means simultaneously encoding the inputs and processing them using our neural network either RNN, LSTM ...