question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Any suggestions to handle longer text?

See original GitHub issue

I’m trying to do predictions with the pre-trained model and I keep running into the issue of;

Token indices sequence length is longer than the specified maximum sequence length for this model (1142 > 512). Running this sequence through the model will result in indexing errors
*** RuntimeError: The size of tensor a (1142) must match the size of tensor b (512) at non-singleton dimension 1

The issue is when I try to predict a text that is longer than 512, this happens. I understand this is because the string is long, other than chopping off the string. Is there any suggestions on how to deal with this problem with the package?

Thank you

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:5

github_iconTop GitHub Comments

4reactions
sorensenjscommented, Mar 23, 2022

Just a suggestion: taking the max over the splits, perhaps breaking at sentences would likely be better than averaging. The model tends to work as a detector, so finding any objectionable content in any part should disqualify the whole document.

1reaction
laurahanucommented, Mar 23, 2022

Hello! This package is not really designed for long form text and the transformer models used (e.g. BER, RoBERTa) have a max sequence length of 512. To get around this, one option would be to split your text into chunks, feed those to the model and then average the results, would that work for your case?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Smart ways to handle long texts - Stack Overflow
I have a .docx document that is 28 pages long, it contains headings, sub-headings, italic text, bold text and some lists.
Read more >
Test fonts readability: 5 Timeless Tips for long Text
5 Timeless Tips for long Text Readability · 1 – Get the font size right (yeah, it's important). · 2 – What font...
Read more >
Get word suggestions & fix mistakes - Android - Messages Help
Open any app that you can type with, like Gmail or Keep. Tap where you can enter text. Type a word. At the...
Read more >
Enable text suggestions in Windows - Microsoft Support
Use text suggestions to quickly complete words as you type a document, chat message, web form, or more.
Read more >
5 Tips for Writing a Better Text Message
1.) Short and Sweet ... Effective messages are short and to the point. Kind of like that last sentence. Shorter messages are also...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found