question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

The training named entity recognizer example is not working.

See original GitHub issue

the simple training example in your documentation is not working , it outputs the following error when i tried it;

TypeError Traceback (most recent call last) <ipython-input-4-2b36816f69f9> in <module>() 7 8 doc = Doc(vocab, words=['Who', 'is', 'Shaka', 'Khan', '?']) ----> 9 entity.update(doc, ['O', 'O', 'B-PERSON', 'L-PERSON', 'O']) 10 11 entity.model.end_training() TypeError: Argument 'gold' has incorrect type (expected spacy.gold.GoldParse, got list)

Also had to import unicode_literals , i’d like to train my own model using my own training data , and add custom named entity labels , i saw the full example in train_ner example file , but i do not understand why is the training data formatted in that way; train_data = [ ( 'Who is Shaka Khan?', [(len('Who is '), len('Who is Shaka Khan'), 'PERSON')] ), ( 'I like London and Berlin.', [(len('I like '), len('I like London'), 'LOC'), (len('I like London and '), len('I like London and Berlin'), 'LOC')] ) ] what is the amount of data needed to train your own model.

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
honnibalcommented, Oct 19, 2016

Fix should be up on the site now.

1reaction
honnibalcommented, Oct 19, 2016

Thanks, will fix.

The data format is (start_offset, end_offset, label) — the offsets are character offsets within the string. You can also supply BILUO tags.

I haven’t pushed documentation for the relevant class, GoldParse, yet. You can see the class in spacy/gold.pyx . You use it like this:


gold = spacy.gold.GoldParse(doc, entities=offsets_or_tags)

It’s hard to predict how much data you’ll need to train your entity recognition model.

If you’re doing this commercially, you might be interested in the custom model service Ines and I have just launched, as Explosion AI. We want to offer an on-demand experience, as opposed to normal (painful) consulting.

The idea is to give you a super quick turn around and a try-before-you-buy arrangement — we’ll host the model as an API for you. If you like it, you buy the model and the annotations we used to build it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

SpaCy NER training example from version 1.5.0 doesn't work ...
I tried to use the training example here: https://github.com/explosion/spaCy/blob/master/examples/training/train_ner.py with SpaCy 1.6.0.
Read more >
spaCy custom NER are not working for new data set
I am trying to create few custom NER for my use case. This is a sample of my training data: [[' webex enable...
Read more >
Training a custom Named Entity Recognizer with Spacy - lvngd
During training, the model learns by looking at each text example, and for each word tries to predict the appropriate named entity label....
Read more >
How to Train a spaCy NER model (Named Entity Recognition ...
In this video, we use the training set that we created in the last video via the spaCy EntityRuler that we created in...
Read more >
Training Custom NER models in SpaCy to auto-detect named ...
4. Format of the training examples ... spaCy accepts training data as list of tuples. Each tuple should contain the text and a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found