Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

The training named entity recognizer example is not working.

See original GitHub issue

the simple training example in your documentation is not working , it outputs the following error when i tried it;

TypeError Traceback (most recent call last) <ipython-input-4-2b36816f69f9> in <module>() 7 8 doc = Doc(vocab, words=['Who', 'is', 'Shaka', 'Khan', '?']) ----> 9 entity.update(doc, ['O', 'O', 'B-PERSON', 'L-PERSON', 'O']) 10 11 entity.model.end_training() TypeError: Argument 'gold' has incorrect type (expected spacy.gold.GoldParse, got list)

Also had to import unicode_literals , i’d like to train my own model using my own training data , and add custom named entity labels , i saw the full example in train_ner example file , but i do not understand why is the training data formatted in that way; train_data = [ ( 'Who is Shaka Khan?', [(len('Who is '), len('Who is Shaka Khan'), 'PERSON')] ), ( 'I like London and Berlin.', [(len('I like '), len('I like London'), 'LOC'), (len('I like London and '), len('I like London and Berlin'), 'LOC')] ) ] what is the amount of data needed to train your own model.

Issue Analytics

State:
Created 7 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

honnibalcommented, Oct 19, 2016

Fix should be up on the site now.

1reaction

honnibalcommented, Oct 19, 2016

Thanks, will fix.

The data format is (start_offset, end_offset, label) — the offsets are character offsets within the string. You can also supply BILUO tags.

I haven’t pushed documentation for the relevant class, GoldParse, yet. You can see the class in spacy/gold.pyx . You use it like this:


gold = spacy.gold.GoldParse(doc, entities=offsets_or_tags)

It’s hard to predict how much data you’ll need to train your entity recognition model.

If you’re doing this commercially, you might be interested in the custom model service Ines and I have just launched, as Explosion AI. We want to offer an on-demand experience, as opposed to normal (painful) consulting.

The idea is to give you a super quick turn around and a try-before-you-buy arrangement — we’ll host the model as an API for you. If you like it, you buy the model and the annotations we used to build it.