Misleading error when getting empty span's `text` attribute
See original GitHub issueSlicing empty span from Doc works without errors (e.g. doc[:0]
or doc[3:3]
), but attempt to use text
attribute of a resulting empty span throws IndexError: [E201] Span index out of range
. However, text_with_ws
, attribute works fine, so it’ll do as a workaround for my usage, but that seems like a very unintuitive behaviour.
How to reproduce the behaviour
import spacy
nlp = spacy.load('en')
doc = nlp('something')
print(doc[:0].text_with_ws) # prints empty string, as expected
print(doc[:0].text) # throws IndexError: [E201] Span index out of range
Your Environment
- spaCy version: 2.3.5
- Platform: Linux-5.8.0-38-generic-x86_64-with-glibc2.29
- Python version: 3.8.5
- Models: en
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (6 by maintainers)
Top Results From Across the Web
html - textarea's "required" attribute doesn't work even though ...
I created a simple page with list box and text area with conditions that all should be required. List box ...
Read more >Error Explanations for The W3C Markup Validation Service
This error may appear when the validator receives an empty document. Please make sure that the document you are uploading is not empty,...
Read more >HTML <span> Tag
The HTML <span> tag is a generic container for inline elements and content. This allows you to apply styles and other attributes to...
Read more ><input>: The Input (Form Input) element - HTML
A push button with no default behavior displaying the value of the value attribute, empty by default.
Read more >Image Missing Alternative Text - Equalize Digital
Did “Image Missing or Empty Alternative Text” appear in an accessibility audit ... tags with a type of image and check to see...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I think it’s both: the span bound check in
__getitem__
leads tospan.text
failing for a 0-length span when it didn’t in earlier versions because you used to be able to accessspan[-1]
even for 0-length spans (it would return the token before the span). That’s the only thing that changed between v2.3.2 and v2.3.5, and the lack of the bounds check was masking the.text
bug.Hi @mzeidhassan : Could be, yes, I’ll leave a message on that thread to clarify.