question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

unexpected label produced by custom-trained dependency parser

See original GitHub issue

I am using the CLI training interface to train a custom tagger and parser. The dependency labels are a custom set of semantic labels. The training data is converted from conll format. I am not using the --base-model argument, so I believe I’m starting from a blank model. Also, the output directory does not exist prior to training. After training, the model sometimes outputs an unexpected dependency tag (‘dep’) which is not part of my training data.

Info about spaCy

  • spaCy version: 2.2.3
  • Platform: Linux-4.4.0-18362-Microsoft-x86_64-with-debian-buster-sid
  • Python version: 3.7.6

This issue links to this stackoverflow question.

with open("/path/to/my/train_data.json", 'r') as j:
    contents_train = json.load(j)
    
with open("/path/to/my/dev_data.json", 'r') as j:
    contents_dev = json.load(j)
    
contents = contents_train + contents_dev

labels = {}
for c in contents:
    for p in c['paragraphs']:
        for s in p['sentences']:
            for t in s['tokens']:
                if t['dep'] in labels:
                    labels[t['dep']] += 1
                else:
                    labels[t['dep']] = 1
                    
print(labels)
{'compound': 139, 'ROOT': 171, '-': 386, 'modification': 122, 'proximity': 65, 'quality': 36, 'feature': 77, 'containment': 65, 'cuisine': 10, 'availability': 10, 'timing': 10, 'pricing': 4, 'negation': 3, 'directional': 6, 'destination': 16, 'attachment': 1, 'origin': 7, 'access': 1, 'accessibility': 1, 'quantification': 2, 'tmode': 2}

# train the model
! python -m spacy train en full_model_trained_custSem \
/path/to/my/train_data.json \
/path/to/my/dev_data.json \
--pipeline 'tagger,parser' \
--gold-preproc

# load trained model
nlp = spacy.load('full_model_trained_custSem/model-best')

# test trained model
q = "best deli in Seattle"
rel_list = []
for t in nlp(q):
    rel_list.append(t.text+' <-- '+t.dep_+' -- '+t.head.text)
print(rel_list)
['best <-- dep -- deli', 'deli <-- ROOT -- deli', 'in <-- - -- deli', 'Seattle <-- containment -- deli']

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:12 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
fresejoergcommented, May 7, 2020

The original issue just re-emerged for me. A freshly trained version of my model is predicting dep as a label. I verified that no - labels are in my training or dev dataset. So it appears that the root cause is unrelated to special handling for this character.

0reactions
github-actions[bot]commented, Nov 5, 2021

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

spacy dependency parser trained on custom semantics ...
I have trained a spacy model for POS tags and dependency labels with the dependency labels being a custom set of semantic labels....
Read more >
Linguistic Features · spaCy Usage Documentation
spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more....
Read more >
Stanford typed dependencies manual
Revised for the Stanford Parser v. 3.7.0 in September 2016. Please note that this manual describes the original Stanford Dependencies representation.
Read more >
Dependency Parsing - Stanza - Stanford NLP Group
The dependency parsing module builds a tree structure of words from the input sentence, which represents the syntactic dependency relations between words.
Read more >
Tree kernel-based semantic role labeling with enriched parse tree ...
Second, an enriched parse tree structure is proposed to both well preserve the ... Dependency parsing with finite state transducers and compression rules....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found