Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add 'head token index' attribute to words in nlp(doc)

See original GitHub issue

When navigating a spaCy-processed document, e.g –

from spacy.en import English
nlp = English()
doc = nlp('The old man shook hands with the young man.')
for word in doc:
    print(word, word.tag_, word.head)

– which outputs:

The DT man old JJ man man NN shook shook VBD shook hands NNS shook with IN shook the DT man young JJ man man NN with . . shook

It would be great to have a token number stored with the head (i.e. word.headi or something like that). This would allow you to unambiguously identify the head (e.g. man_3 versus man_9 in this sentence).

Issue Analytics

State:
Created 7 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

honnibalcommented, Sep 2, 2016

There’s no immediate plan to use the Edit transition. There aren’t enough speech users, and people don’t know they want disfluency detection :p.

Try training a bidirectional LSTM to pre-process. It’ll probably work really well.

1reaction

honnibalcommented, Sep 2, 2016

Each token has a .i attribute. So token.head.i should give you exactly what you want.

Top Results From Across the Web

Add 'head token index' attribute to words in nlp(doc) · Issue #549

honnibal When navigating a spaCy-processed document, e.g -- from spacy.en ... Add 'head token index' attribute to words in nlp(doc) #549.

Doc · spaCy API Documentation

A Doc is a sequence of Token objects. Access sentences and named entities, export annotations to numpy arrays, losslessly serialize to compressed binary ......

How to get the index of a token in a sentence in spaCy?

I am aware of the attributes for tokens https://spacy.io/api/token#attributes The i attribute returns the index within the whole parent document ...

spaCy Cheat Sheet: Advanced NLP in Python - DataCamp

Processing text with the nlp object returns a Doc object that holds all information about the tokens, their linguistic features and their relationships....

Token | Cloud Natural Language API

Gender classes of nouns reflected in the behaviour of associated words. Enums. GENDER_UNKNOWN, Gender is not applicable in the analyzed language ...