question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Model differences

See original GitHub issue

For english alone, there are 3 helpful available models.

However, not much in detail how different they are.

en_core_web_sm	        50 MB	Vocab, syntax, entities, word vectors
en_core_web_md	        1 GB	Vocab, syntax, entities, word vectors
en_depent_web_md	328 MB	Vocab, syntax, entities

Can you provide some descriptions on their accuracy, entity type recognition, use-case which will be helpful?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:8 (2 by maintainers)

github_iconTop GitHub Comments

3reactions
inescommented, Apr 27, 2017

Yes, that’s a good idea! I think the model releases would be a good place for this info as well, and it could be combined with the accuracy numbers.

Just edited the release notes of the new French model as an example: https://github.com/explosion/spacy-models/releases/tag/fr_depvec_web_lg-1.0.0

Will start updating the other models as well.

2reactions
inescommented, Apr 24, 2017

Thanks for opening this issue – since this question has come up before, I agree that this should definitely be more clear in the docs. I’ll just post all notes here so we can discuss them and add them to the docs.

Differences and accuracy

Most differences are obviously statistical. In general, we do expect larger models to be “better” and more accurate overall. Ultimately, it depends on your use case and requirements. People have reported pretty good results with the smaller model, so we usually recommend trying that first, writing a few test specific to your use case and then comparing the results to a larger model, if necessary.

We’re also going to compile a better list of accuracy numbers and distribute them with each model, for example in its meta.json.

Model Parser accuracy POS tagging accuracy NER accuracy
en_core_web_sm ~89% coming coming
en_core_web_md, en_depent_web_md ~90.6% coming coming

Model releases and release notes

All models are published as GitHub releases and their release notes contain more detailed info. Going forward, we’ll also add a “Changes” section to new model releases that’ll list all updates since the last release, to give you a better idea of how that model is different. You can see an example of that in the pre-release of an alpha model we’re currently testing.

Model naming conventions

In general, spaCy expects all model packages to follow the naming convention of [lang]_[name]. For our models, we also chose to divide the name into three components:

Name Description
type model capabilities (e.g. core for general-purpose model with vocabulary, syntax, entities and word vectors, or depent for only vocab, syntax and entities.)
genre type of text the model is trained on (e.g. web for web text, news for news text)
size Model size indicator (sm, md or lg)

For example, en_depent_web_md is a medium-sized English model trained on written web text (blogs, news, comments), that includes vocabulary, syntax and entities.

I hope those naming conventions aren’t too confusing – but we felt like it was necessary to decide on a scheme like this upfront to make we don’t end up with confusing or indistinguishable model names. Especially since there will be many more models in the future – either published by us, or by the community. (For example, if you were to train a Spanish NER model on dialog text, you’d call it es_ent_dialog_md and it’d be clear what it is.)

✅ TODO

  • come up with generalised format to distribute the accuracies with the models
  • add figures to the model meta.json files, docs and releases
  • update docs with more information on model differences, how to pick the right model etc.
Read more comments on GitHub >

github_iconTop Results From Across the Web

Compare iPhone Models - Apple
Compare features and technical specifications for the iPhone 14 Pro, iPhone 14 Pro Max, iPhone 14, iPhone 14 Plus, iPhone SE, and many...
Read more >
Applying the Model-Comparison Approach to Test Specific ...
To recap, the essence of the model comparison approach to statistical testing is that it conceives of statistical tests of experimental effects ...
Read more >
Introduction to model comparisons - Sean Trott
The core idea behind the model comparison approach is to compare the explanatory power of two or more models, with the goal of...
Read more >
Model Comparison Methods - ScienceDirect.com
Models are often compared on the basis of their goodness of fit. That is, among a set of models under comparison, the scientist...
Read more >
Feature Model Differences | SpringerLink
Feature models are a widespread means to represent commonality and variability in software product lines. As is the case for other kinds of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found