Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Weird behaviour using DependencyMatcher with OP attribute

See original GitHub issue

How to reproduce the behaviour

From the doc it is not very clear how the OP attribute should behave when used in the DependencyMatcher or if it is not even supported.

Take this example:

import spacy
from spacy.matcher import DependencyMatcher


nlp = spacy.load("en_core_web_sm")

text = "The dress is beautiful"
doc = nlp(text)

matcher = DependencyMatcher(nlp.vocab)
pattern = [
    {"RIGHT_ID": "is", "RIGHT_ATTRS": {"LEMMA": "be"}},
    {
        "LEFT_ID": "is",
        "REL_OP": ">",
        "RIGHT_ID": "subj",
        "RIGHT_ATTRS": {"DEP": "nsubj"},
    },
    {
        "LEFT_ID": "is",
        "REL_OP": ">",
        "RIGHT_ID": "adj",
        "RIGHT_ATTRS": {"POS": "ADJ",},
    },
    {
        "LEFT_ID": "is",
        "REL_OP": ">",
        "RIGHT_ID": "not",
        "RIGHT_ATTRS": {"DEP": "neg", "OP": "?"},
    },
]
matcher.add("test", [pattern])


for _, tokens in matcher(doc):
    print("------------")
    print(tokens)
    for token in tokens:
        print(doc[token])

This will print out:

------------
[2, 1, 3, 1]
is
dress
beautiful
dress
------------
[2, 1, 3, 3]
is
dress
beautiful
beautiful

The last token in the pattern (called “not”) is never matched, however we still get two matches where the 4th entry is an unrelated token. Using the * operators gives the same result. Matching the sentence “The dress is not beautiful” still gives a weird result:

------------
[2, 1, 4, 1]
is
dress
beautiful
dress
------------
[2, 1, 4, 3]
is
dress
beautiful
not
------------
[2, 1, 4, 4]
is
dress
beautiful
beautiful

Info about spaCy

spaCy version: 3.0.0rc2
Platform: Linux-5.10.3-arch1-1-x86_64-with-glibc2.2.5
Python version: 3.8.7
Pipelines: en_core_web_sm (3.0.0a0), en_core_web_trf (3.0.0a0)

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:7 (7 by maintainers)

Top GitHub Comments

1reaction

adrianeboydcommented, Jan 4, 2021

I think the current dependency matcher algorithm is only going to work correctly if the token pattern matches exactly one token. I think, at least initially, we should simply forbid OP in the patterns so you don’t get these kinds of weird results.

0reactions

github-actions[bot]commented, Oct 27, 2021

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.