question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Does max_seq_length specify the maxium number of words

See original GitHub issue

I’m trying to figure out how the --max_seq_length parameter works in run_classifier. Based on the source, it seems like it represents the number of words? Is that correct?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

9reactions
rodgzillacommented, Dec 10, 2018

max_seq_length specifies the maximum number of tokens of the input. The number of token is superior or equal to the number of words of an input.

For example, the following sentence:

The man hits the saxophone and demonstrates how to properly use the racquet.

is tokenized as follows:

the man hits the saxophone and demonstrates how to properly use the ra ##c ##quet .

And depending on the task 2 to 3 additional special tokens ([CLS] and [SEP]) are added to the input to format it.

2reactions
thomwolfcommented, Apr 23, 2019

@tsungruihon yes, just use smaller sentences

@echan00 no automatic cut off but there is a warning from the tokenizer that your inputs are too long and the model will throw an error. You have to limit the size manually.

Read more comments on GitHub >

github_iconTop Results From Across the Web

token indices sequence length is longer than the specified ...
When I use Bert, the "token indices sequence length is longer than the specified maximum sequence length for this model (1017 > 512)"...
Read more >
What is the length limit of Transformers? - Cross Validated
There is no theoretical limit on the input length (ie number of tokens for a sentence in NLP) for transformers.
Read more >
Preprocessing data — transformers 3.0.2 documentation
True or 'only_first' truncate to a maximum length specified by the max_length argument or the maximum length accepted by the model if no...
Read more >
[D] Why is the maximum input sequence length of BERT is ...
The transformer's attention is quadratic in sentence length. I think they limit it to 512 to reach a balance between performance and memory ......
Read more >
How to use Bert for long text classification? - nlp - Stack Overflow
We know that BERT has a max length limit ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found