Infinite loop if token texts don't match input text
See original GitHub issueIt works with other text but running this code will cause an infinite loop. It does not like “im Anhang”.
snlp = stanfordnlp.Pipeline(lang="de")
nlp = StanfordNLPLanguage(snlp)
doc = nlp("im Anhang")
for token in doc:
print(token.text, token.lemma_, token.pos_, token.dep_)
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Infinite Loop in Tokenizing Input
In my example code, workwithtokenstuff() is executing infinitely. I have formed similar loops before, I am not sure why the scanf at the...
Read more >Dependency Matcher execution time · Discussion #10481
I am heavily using spacy Dependency Matcher (https://spacy.io/api/dependencymatcher) to extract information from long texts. Some executions take very long ...
Read more >PARSERS AND STATE MACHINES
A parser reads a stream or sequence of tokens and generates a “parse tree” out of it. Or rather, a tree is gener-...
Read more >Java Basics - Java Programming Tutorial
This chapter explains the basic syntaxes of the Java programming language. I shall assume that you have written some simple Java programs.
Read more >Chapter 5. The Rule Language
This error indicates that the parser was looking for a particular symbol that it didn't find at the current input position. Here are...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yes, in that case,
"des enfants"
would become aDoc
consisting of the tokens["de", "les", "enfants"]
anddoc.text
would be"de les enfants"
. (Sotext == doc.text
wouldn’t hold true anymore, which is usually a fundamental principle in spaCy.)Edit: Wrote a little fix and just trying it out with German. If this works, the compromise would be that if the tokens align with the text, you get tokens with whitespace information (
token.whitespace
,token.text_with_ws
and by proxyDoc.text
,Span.text
). If the tokens do not align with the input text, you get the tokens but no aligned whitespace information, soDoc.text
andSpan.text
may not be perfectly accurate.Edit 2: The fix is now available in the latest release. I think it’s an okay compromise.
hey there first of all, thank you for the great work with spacy!! it’s so nice 😃 when trying out the stanfordnlp + spacy parser with the default parser for Italian and I get this result for prepositions with articles and apostrophes, I think the problem sometimes is the “wrong” apostrophe but maybe if the mwt would be accessible in any way (I couldn’t figure out if it is), this could be solved? (here they solve it with a ‘parent’ structure: https://stanfordnlp.github.io/stanfordnlp/mwt.html) :
I just had a quick look at the french one, and saw similar issues:
thanks for the help 😃)))