question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Noun chunk info from token

See original GitHub issue

Hello spaCy team!

It appears that there isn’t an option to determine whether any single token is part of a noun chunk (as determined from doc.noun_chunks), in the same way as token.ent_iob.

The main problem that I am trying to solve is merging noun_chunks in specific sentences.

Is this a feature that could be added? Or is there another solution?

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:13 (5 by maintainers)

github_iconTop GitHub Comments

6reactions
honnibalcommented, Mar 22, 2017

Maybe we should have a span2doc function? I think this might take some pressure off the span objects.

1reaction
Xeoncrosscommented, Sep 26, 2016

Trying to tie noun_chunks to specific sentences is a feature I’m trying to build as well. I altered your code @owlas for those trying to run it live on a document.

noun_words = set(w.i for nc in doc.noun_chunks for w in nc)
for s in doc.sents:
    noun_from_chunks_in_sentence = [w for w in s if w.i in noun_words]
    print s.text
    print noun_from_chunks_in_sentence

I’m assuming this just needs to be changed to check the left and right most child of the noun chunk is inside the sentence left/right most child.

for s in doc.sents:
    print s.text
    for nc in doc.noun_chunks:
        if nc.start >= s.start and nc.end <= s.end:
            print "INSIDE: " + nc.text
        else:
            print "OUT: " + nc.text
        print s.start, s.end
        print nc.start, nc.end
        print ""
Read more comments on GitHub >

github_iconTop Results From Across the Web

extract noun-chunk from single token - python - Stack Overflow
I am interested in writing a function that returns, for every token, the noun-chunk that (may) include that token. Something like: for tok...
Read more >
Extracting noun chunks | Python Natural Language ...
Noun chunks are spaCy Span objects and have all their properties. See the official documentation at https://spacy.io/api/token.
Read more >
How to keep original noun chunk spans in Spacy after ...
Due to the noun chunk merging step in the pipeline, “the new store” becomes a single token and there is no way to...
Read more >
Chunking in NLP: decoded - Towards Data Science
In short, Chunking means grouping of words/tokens into chunks. ... sentence is divided into two different chunks which are NP(noun phrase).
Read more >
7 Extracting Information from Text - NLTK
We can match these noun phrases using a slight refinement of the first tag pattern above, i.e. <DT>?<JJ.*>*<NN.*>+. This will chunk any sequence...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found