Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Misleading error when getting empty span's `text` attribute

See original GitHub issue

Slicing empty span from Doc works without errors (e.g. doc[:0] or doc[3:3]), but attempt to use text attribute of a resulting empty span throws IndexError: [E201] Span index out of range. However, text_with_ws, attribute works fine, so it’ll do as a workaround for my usage, but that seems like a very unintuitive behaviour.

How to reproduce the behaviour

import spacy

nlp = spacy.load('en')
doc = nlp('something')
print(doc[:0].text_with_ws)  # prints empty string, as expected
print(doc[:0].text)  # throws IndexError: [E201] Span index out of range

Your Environment

spaCy version: 2.3.5
Platform: Linux-5.8.0-38-generic-x86_64-with-glibc2.29
Python version: 3.8.5
Models: en

Issue Analytics

State:
Created 3 years ago
Comments:7 (6 by maintainers)

Top GitHub Comments

1reaction

adrianeboydcommented, Feb 8, 2021

I think it’s both: the span bound check in __getitem__ leads to span.text failing for a 0-length span when it didn’t in earlier versions because you used to be able to access span[-1] even for 0-length spans (it would return the token before the span). That’s the only thing that changed between v2.3.2 and v2.3.5, and the lack of the bounds check was masking the .text bug.

1reaction

svlandegcommented, Feb 3, 2021

Hi @mzeidhassan : Could be, yes, I’ll leave a message on that thread to clarify.