question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pipeline("sentiment-analysis') - index out of range in self

See original GitHub issue

Environment info

  • transformers version: 4.2.2
  • Platform: Manjaro Linux (Feb 2021)
  • Python version: 3.8.5
  • PyTorch version (GPU?): 1.7.1 (GPU)
  • Tensorflow version (GPU?):
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Who can help

Library:

Information

Model I am using (Bert, XLNet …): distilbert-base-uncased-finetuned-sst-2-english

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: sentiment analysis
  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

My dataset consists blog articles and comments on them. Sometimes there are non-english characters, code snippets or other weird sequences.

Error occurs when:

  1. Initialize the default pipeline(“sentiment-analysis”) with device 0 or -1
  2. Run inference classifier with truncation=True of my dataset
  3. After some time the classifier returns the following error:

CPU: Index out of range in self

GPU: /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [56,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

Expected behavior

I thought at first that my data was messing up the tokenization process or the model because sometimes there are strange sequences in the data e.g. code, links or stack traces.

However, if you name the model and tokenizer during pipeline initialization, inference from the model works fine for the same data:

classifier = pipeline('sentiment-analysis', model='distilbert-base-uncased-finetuned-sst-2-english', tokenizer='distilbert-base-uncased', device=0)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
nikchhacommented, Feb 9, 2021

Hello!

Thank you so much! That fixed the issue. I already thought the missing max_length could be the issue but it did not help to pass max_length = 512 to the call function of the pipeline.

I used the truncation flag before but I guess it did not work due to the missing max_length value.

Anyway, works perfectly now! Thank you!

0reactions
LysandreJikcommented, Feb 9, 2021

Unfortunately this was due to the ill-configured tokenizer on the hub. We’re working on a more general fix to prevent this from happening in the future.

Happy to help!

Read more comments on GitHub >

github_iconTop Results From Across the Web

out of range in self -> Applying pre-trained model on pandas ...
I'm trying to apply sentiment analysis into ...
Read more >
Out of index error in pipeline - Hugging Face Forums
But when trying to predict for some text I get IndexError: index out of range in self. Not sure what to tweak?
Read more >
IndexError: list index out of range and python - LearnDataSci
This index error is triggered when indexing a list using a value outside of its range of indexes. The best way to avoid...
Read more >
Python IndexError: tuple index out of range Solution
When you try to access an item in a tuple that does not exist, Python returns an error that says “tuple index out...
Read more >
Chapter 4. Text Vectorization and Transformation Pipelines
In text analysis, instances are entire documents or utterances, which can vary in length from quotes or tweets to entire books, but whose...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found