unexpected label produced by custom-trained dependency parser
See original GitHub issueI am using the CLI training interface to train a custom tagger and parser. The dependency labels are a custom set of semantic labels. The training data is converted from conll format. I am not using the --base-model argument, so I believe I’m starting from a blank model. Also, the output directory does not exist prior to training. After training, the model sometimes outputs an unexpected dependency tag (‘dep’) which is not part of my training data.
Info about spaCy
- spaCy version: 2.2.3
- Platform: Linux-4.4.0-18362-Microsoft-x86_64-with-debian-buster-sid
- Python version: 3.7.6
This issue links to this stackoverflow question.
with open("/path/to/my/train_data.json", 'r') as j:
contents_train = json.load(j)
with open("/path/to/my/dev_data.json", 'r') as j:
contents_dev = json.load(j)
contents = contents_train + contents_dev
labels = {}
for c in contents:
for p in c['paragraphs']:
for s in p['sentences']:
for t in s['tokens']:
if t['dep'] in labels:
labels[t['dep']] += 1
else:
labels[t['dep']] = 1
print(labels)
{'compound': 139, 'ROOT': 171, '-': 386, 'modification': 122, 'proximity': 65, 'quality': 36, 'feature': 77, 'containment': 65, 'cuisine': 10, 'availability': 10, 'timing': 10, 'pricing': 4, 'negation': 3, 'directional': 6, 'destination': 16, 'attachment': 1, 'origin': 7, 'access': 1, 'accessibility': 1, 'quantification': 2, 'tmode': 2}
# train the model
! python -m spacy train en full_model_trained_custSem \
/path/to/my/train_data.json \
/path/to/my/dev_data.json \
--pipeline 'tagger,parser' \
--gold-preproc
# load trained model
nlp = spacy.load('full_model_trained_custSem/model-best')
# test trained model
q = "best deli in Seattle"
rel_list = []
for t in nlp(q):
rel_list.append(t.text+' <-- '+t.dep_+' -- '+t.head.text)
print(rel_list)
['best <-- dep -- deli', 'deli <-- ROOT -- deli', 'in <-- - -- deli', 'Seattle <-- containment -- deli']
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (6 by maintainers)
Top Results From Across the Web
spacy dependency parser trained on custom semantics ...
I have trained a spacy model for POS tags and dependency labels with the dependency labels being a custom set of semantic labels....
Read more >Linguistic Features · spaCy Usage Documentation
spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more....
Read more >Stanford typed dependencies manual
Revised for the Stanford Parser v. 3.7.0 in September 2016. Please note that this manual describes the original Stanford Dependencies representation.
Read more >Dependency Parsing - Stanza - Stanford NLP Group
The dependency parsing module builds a tree structure of words from the input sentence, which represents the syntactic dependency relations between words.
Read more >Tree kernel-based semantic role labeling with enriched parse tree ...
Second, an enriched parse tree structure is proposed to both well preserve the ... Dependency parsing with finite state transducers and compression rules....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The original issue just re-emerged for me. A freshly trained version of my model is predicting
dep
as a label. I verified that no-
labels are in my training or dev dataset. So it appears that the root cause is unrelated to special handling for this character.This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.