question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Weird behaviour using DependencyMatcher with OP attribute

See original GitHub issue

How to reproduce the behaviour

From the doc it is not very clear how the OP attribute should behave when used in the DependencyMatcher or if it is not even supported.

Take this example:

import spacy
from spacy.matcher import DependencyMatcher


nlp = spacy.load("en_core_web_sm")

text = "The dress is beautiful"
doc = nlp(text)

matcher = DependencyMatcher(nlp.vocab)
pattern = [
    {"RIGHT_ID": "is", "RIGHT_ATTRS": {"LEMMA": "be"}},
    {
        "LEFT_ID": "is",
        "REL_OP": ">",
        "RIGHT_ID": "subj",
        "RIGHT_ATTRS": {"DEP": "nsubj"},
    },
    {
        "LEFT_ID": "is",
        "REL_OP": ">",
        "RIGHT_ID": "adj",
        "RIGHT_ATTRS": {"POS": "ADJ",},
    },
    {
        "LEFT_ID": "is",
        "REL_OP": ">",
        "RIGHT_ID": "not",
        "RIGHT_ATTRS": {"DEP": "neg", "OP": "?"},
    },
]
matcher.add("test", [pattern])


for _, tokens in matcher(doc):
    print("------------")
    print(tokens)
    for token in tokens:
        print(doc[token])

This will print out:

------------
[2, 1, 3, 1]
is
dress
beautiful
dress
------------
[2, 1, 3, 3]
is
dress
beautiful
beautiful

The last token in the pattern (called “not”) is never matched, however we still get two matches where the 4th entry is an unrelated token. Using the * operators gives the same result. Matching the sentence “The dress is not beautiful” still gives a weird result:

------------
[2, 1, 4, 1]
is
dress
beautiful
dress
------------
[2, 1, 4, 3]
is
dress
beautiful
not
------------
[2, 1, 4, 4]
is
dress
beautiful
beautiful

Info about spaCy

  • spaCy version: 3.0.0rc2
  • Platform: Linux-5.10.3-arch1-1-x86_64-with-glibc2.2.5
  • Python version: 3.8.7
  • Pipelines: en_core_web_sm (3.0.0a0), en_core_web_trf (3.0.0a0)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
adrianeboydcommented, Jan 4, 2021

I think the current dependency matcher algorithm is only going to work correctly if the token pattern matches exactly one token. I think, at least initially, we should simply forbid OP in the patterns so you don’t get these kinds of weird results.

0reactions
github-actions[bot]commented, Oct 27, 2021

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

DependencyMatcher · spaCy API Documentation
The DependencyMatcher follows the same API as the Matcher and PhraseMatcher and lets you match on dependency trees using Semgrex operators.
Read more >
Problem to extract NER subject + verb with spacy and Matcher
I work on an NLP project and i have to use spacy and spacy Matcher to extract all named entities who are ...
Read more >
[Solved] Obtaining attributes weird behavior can anyone ...
Hi still trying to understand Mathematica and working my way through "essentials of programming..." I am trying to explain the following behaviour.
Read more >
Jean's Journal - Dependency Finder - SourceForge
Back on 2008-11-05, I troyed with the idea of using composition to deal with sharing behavior in the com.jeantessier.classreader.Visitor hierarchy.
Read more >
Syntactic Search by Example - SPIKE
1In this paper, we very loosely use the term “syntactic” ... 2014) includes a dependency matcher called SEM-. GREX,3 which uses a ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found