Differences in dependency trees built by models of different versions
See original GitHub issueHello!
I trained a tagger/parser for portuguese using spaCy 2.1.8 and got a consistent model (let’s call it model A), but, when I upgraded to spaCy 2.2, I discovered it didn’t work the same way in the newer version. I trained a new model (model B) for this version using the exact same corpus and got metrics (tokenizer, tags, uas and las) relatively close to the ones presented by model A, as show in the table below.
metric | model A (spaCy 2.1.8) |
model B (spaCy 2.2) |
---|---|---|
tokenizer | 98.55 | 98.35 |
tags | 94.71 | 96.98 |
uas | 85.76 | 85.48 |
las | 82.52 | 82.12 |
However, when the models are used to annotate the same sentence, they give different results. For example, in the sentence [...] ou seja será necessários dois planetas para suprir a demanda [...]
, model A builds the following dependency tree:
In the meantime, model B builds a slightly different tree, as shown below.
I know there is a difference in how models apply lemmas from spaCy 2.1 to 2.2, but is there any other changes that could affect the model behaviour? How can two models trained with the exact same corpus and with similar metrics can diverge while constructing dependency trees?
Issue Analytics
- State:
- Created 3 years ago
- Reactions:4
- Comments:5 (2 by maintainers)
Top GitHub Comments
Hi everyone! Same here… I noticed the lemmatization difference between models but it wasn’t a big matter, but then I realized some sentence structures were different and some rules with dependencies didn’t work so well anymore.
I’m facing the very same problem here guys. I’m stuck in spaCy 2.1 because of that weird trees behavior