The training named entity recognizer example is not working.
See original GitHub issuethe simple training example in your documentation is not working , it outputs the following error when i tried it;
TypeError Traceback (most recent call last) <ipython-input-4-2b36816f69f9> in <module>() 7 8 doc = Doc(vocab, words=['Who', 'is', 'Shaka', 'Khan', '?']) ----> 9 entity.update(doc, ['O', 'O', 'B-PERSON', 'L-PERSON', 'O']) 10 11 entity.model.end_training() TypeError: Argument 'gold' has incorrect type (expected spacy.gold.GoldParse, got list)
Also had to import unicode_literals , i’d like to train my own model using my own training data , and add custom named entity labels , i saw the full example in train_ner example file , but i do not understand why is the training data formatted in that way; train_data = [ ( 'Who is Shaka Khan?', [(len('Who is '), len('Who is Shaka Khan'), 'PERSON')] ), ( 'I like London and Berlin.', [(len('I like '), len('I like London'), 'LOC'), (len('I like London and '), len('I like London and Berlin'), 'LOC')] ) ]
what is the amount of data needed to train your own model.
Issue Analytics
- State:
- Created 7 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
Fix should be up on the site now.
Thanks, will fix.
The data format is (start_offset, end_offset, label) — the offsets are character offsets within the string. You can also supply BILUO tags.
I haven’t pushed documentation for the relevant class,
GoldParse
, yet. You can see the class in spacy/gold.pyx . You use it like this:It’s hard to predict how much data you’ll need to train your entity recognition model.
If you’re doing this commercially, you might be interested in the custom model service Ines and I have just launched, as Explosion AI. We want to offer an on-demand experience, as opposed to normal (painful) consulting.
The idea is to give you a super quick turn around and a try-before-you-buy arrangement — we’ll host the model as an API for you. If you like it, you buy the model and the annotations we used to build it.