Does max_seq_length specify the maxium number of words
See original GitHub issueI’m trying to figure out how the --max_seq_length parameter works in run_classifier. Based on the source, it seems like it represents the number of words? Is that correct?
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (3 by maintainers)
Top Results From Across the Web
token indices sequence length is longer than the specified ...
When I use Bert, the "token indices sequence length is longer than the specified maximum sequence length for this model (1017 > 512)"...
Read more >What is the length limit of Transformers? - Cross Validated
There is no theoretical limit on the input length (ie number of tokens for a sentence in NLP) for transformers.
Read more >Preprocessing data — transformers 3.0.2 documentation
True or 'only_first' truncate to a maximum length specified by the max_length argument or the maximum length accepted by the model if no...
Read more >[D] Why is the maximum input sequence length of BERT is ...
The transformer's attention is quadratic in sentence length. I think they limit it to 512 to reach a balance between performance and memory ......
Read more >How to use Bert for long text classification? - nlp - Stack Overflow
We know that BERT has a max length limit ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

max_seq_lengthspecifies the maximum number of tokens of the input. The number of token is superior or equal to the number of words of an input.For example, the following sentence:
is tokenized as follows:
And depending on the task 2 to 3 additional special tokens (
[CLS]and[SEP]) are added to the input to format it.@tsungruihon yes, just use smaller sentences
@echan00 no automatic cut off but there is a warning from the tokenizer that your inputs are too long and the model will throw an error. You have to limit the size manually.