Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Noun chunk info from token

See original GitHub issue

Hello spaCy team!

It appears that there isn’t an option to determine whether any single token is part of a noun chunk (as determined from doc.noun_chunks), in the same way as token.ent_iob.

The main problem that I am trying to solve is merging noun_chunks in specific sentences.

Is this a feature that could be added? Or is there another solution?

Issue Analytics

State:
Created 7 years ago
Comments:13 (5 by maintainers)

Top GitHub Comments

6reactions

honnibalcommented, Mar 22, 2017

Maybe we should have a span2doc function? I think this might take some pressure off the span objects.

1reaction

Xeoncrosscommented, Sep 26, 2016

Trying to tie noun_chunks to specific sentences is a feature I’m trying to build as well. I altered your code @owlas for those trying to run it live on a document.

noun_words = set(w.i for nc in doc.noun_chunks for w in nc)
for s in doc.sents:
    noun_from_chunks_in_sentence = [w for w in s if w.i in noun_words]
    print s.text
    print noun_from_chunks_in_sentence

I’m assuming this just needs to be changed to check the left and right most child of the noun chunk is inside the sentence left/right most child.

for s in doc.sents:
    print s.text
    for nc in doc.noun_chunks:
        if nc.start >= s.start and nc.end <= s.end:
            print "INSIDE: " + nc.text
        else:
            print "OUT: " + nc.text
        print s.start, s.end
        print nc.start, nc.end
        print ""