question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add 'head token index' attribute to words in nlp(doc)

See original GitHub issue

When navigating a spaCy-processed document, e.g –

from spacy.en import English
nlp = English()
doc = nlp('The old man shook hands with the young man.')
for word in doc:
    print(word, word.tag_, word.head)

– which outputs:

The DT man old JJ man man NN shook shook VBD shook hands NNS shook with IN shook the DT man young JJ man man NN with . . shook

It would be great to have a token number stored with the head (i.e. word.headi or something like that). This would allow you to unambiguously identify the head (e.g. man_3 versus man_9 in this sentence).

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
honnibalcommented, Sep 2, 2016

There’s no immediate plan to use the Edit transition. There aren’t enough speech users, and people don’t know they want disfluency detection :p.

Try training a bidirectional LSTM to pre-process. It’ll probably work really well.

1reaction
honnibalcommented, Sep 2, 2016

Each token has a .i attribute. So token.head.i should give you exactly what you want.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Add 'head token index' attribute to words in nlp(doc) · Issue #549
honnibal When navigating a spaCy-processed document, e.g -- from spacy.en ... Add 'head token index' attribute to words in nlp(doc) #549.
Read more >
Doc · spaCy API Documentation
A Doc is a sequence of Token objects. Access sentences and named entities, export annotations to numpy arrays, losslessly serialize to compressed binary ......
Read more >
How to get the index of a token in a sentence in spaCy?
I am aware of the attributes for tokens https://spacy.io/api/token#attributes The i attribute returns the index within the whole parent document ...
Read more >
spaCy Cheat Sheet: Advanced NLP in Python - DataCamp
Processing text with the nlp object returns a Doc object that holds all information about the tokens, their linguistic features and their relationships....
Read more >
Token | Cloud Natural Language API
Gender classes of nouns reflected in the behaviour of associated words. Enums. GENDER_UNKNOWN, Gender is not applicable in the analyzed language ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found