Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

unexpected label produced by custom-trained dependency parser

See original GitHub issue

I am using the CLI training interface to train a custom tagger and parser. The dependency labels are a custom set of semantic labels. The training data is converted from conll format. I am not using the --base-model argument, so I believe I’m starting from a blank model. Also, the output directory does not exist prior to training. After training, the model sometimes outputs an unexpected dependency tag (‘dep’) which is not part of my training data.

Info about spaCy

spaCy version: 2.2.3
Platform: Linux-4.4.0-18362-Microsoft-x86_64-with-debian-buster-sid
Python version: 3.7.6

This issue links to this stackoverflow question.

with open("/path/to/my/train_data.json", 'r') as j:
    contents_train = json.load(j)
    
with open("/path/to/my/dev_data.json", 'r') as j:
    contents_dev = json.load(j)
    
contents = contents_train + contents_dev

labels = {}
for c in contents:
    for p in c['paragraphs']:
        for s in p['sentences']:
            for t in s['tokens']:
                if t['dep'] in labels:
                    labels[t['dep']] += 1
                else:
                    labels[t['dep']] = 1
                    
print(labels)
{'compound': 139, 'ROOT': 171, '-': 386, 'modification': 122, 'proximity': 65, 'quality': 36, 'feature': 77, 'containment': 65, 'cuisine': 10, 'availability': 10, 'timing': 10, 'pricing': 4, 'negation': 3, 'directional': 6, 'destination': 16, 'attachment': 1, 'origin': 7, 'access': 1, 'accessibility': 1, 'quantification': 2, 'tmode': 2}

# train the model
! python -m spacy train en full_model_trained_custSem \
/path/to/my/train_data.json \
/path/to/my/dev_data.json \
--pipeline 'tagger,parser' \
--gold-preproc

# load trained model
nlp = spacy.load('full_model_trained_custSem/model-best')

# test trained model
q = "best deli in Seattle"
rel_list = []
for t in nlp(q):
    rel_list.append(t.text+' <-- '+t.dep_+' -- '+t.head.text)
print(rel_list)
['best <-- dep -- deli', 'deli <-- ROOT -- deli', 'in <-- - -- deli', 'Seattle <-- containment -- deli']

Issue Analytics

State:
Created 3 years ago
Comments:12 (6 by maintainers)

Top GitHub Comments

1reaction

fresejoergcommented, May 7, 2020

The original issue just re-emerged for me. A freshly trained version of my model is predicting dep as a label. I verified that no - labels are in my training or dev dataset. So it appears that the root cause is unrelated to special handling for this character.

0reactions

github-actions[bot]commented, Nov 5, 2021

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Top Results From Across the Web

spacy dependency parser trained on custom semantics ...

I have trained a spacy model for POS tags and dependency labels with the dependency labels being a custom set of semantic labels....

Linguistic Features · spaCy Usage Documentation

spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more....

Stanford typed dependencies manual

Revised for the Stanford Parser v. 3.7.0 in September 2016. Please note that this manual describes the original Stanford Dependencies representation.

Dependency Parsing - Stanza - Stanford NLP Group

The dependency parsing module builds a tree structure of words from the input sentence, which represents the syntactic dependency relations between words.

Tree kernel-based semantic role labeling with enriched parse tree ...

Second, an enriched parse tree structure is proposed to both well preserve the ... Dependency parsing with finite state transducers and compression rules....