Add 'head token index' attribute to words in nlp(doc)
See original GitHub issueWhen navigating a spaCy-processed document, e.g –
from spacy.en import English
nlp = English()
doc = nlp('The old man shook hands with the young man.')
for word in doc:
print(word, word.tag_, word.head)
– which outputs:
The DT man old JJ man man NN shook shook VBD shook hands NNS shook with IN shook the DT man young JJ man man NN with . . shook
It would be great to have a token number stored with the head (i.e. word.headi or something like that). This would allow you to unambiguously identify the head (e.g. man_3 versus man_9 in this sentence).
Issue Analytics
- State:
- Created 7 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Add 'head token index' attribute to words in nlp(doc) · Issue #549
honnibal When navigating a spaCy-processed document, e.g -- from spacy.en ... Add 'head token index' attribute to words in nlp(doc) #549.
Read more >Doc · spaCy API Documentation
A Doc is a sequence of Token objects. Access sentences and named entities, export annotations to numpy arrays, losslessly serialize to compressed binary ......
Read more >How to get the index of a token in a sentence in spaCy?
I am aware of the attributes for tokens https://spacy.io/api/token#attributes The i attribute returns the index within the whole parent document ...
Read more >spaCy Cheat Sheet: Advanced NLP in Python - DataCamp
Processing text with the nlp object returns a Doc object that holds all information about the tokens, their linguistic features and their relationships....
Read more >Token | Cloud Natural Language API
Gender classes of nouns reflected in the behaviour of associated words. Enums. GENDER_UNKNOWN, Gender is not applicable in the analyzed language ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
There’s no immediate plan to use the
Edit
transition. There aren’t enough speech users, and people don’t know they want disfluency detection :p.Try training a bidirectional LSTM to pre-process. It’ll probably work really well.
Each token has a
.i
attribute. Sotoken.head.i
should give you exactly what you want.