train_ner.py train data format to spaCy's json
See original GitHub issueHi, I’m trying to use the CLI train command to train a NER model. I was able to train it following the example from train_ner.py on which the data needed to be formatted like this:
TRAIN_DATA = [
("Who is Shaka Khan?", {"entities": [(7, 17, "PERSON")]}),
("I like London and Berlin.", {"entities": [(7, 13, "LOC"), (18, 24, "LOC")]}),
]
I now want to use the more powerful CLI.train command, but I have all my data in the format above, is there an existing script for this conversion? As far as I can see this isn’t supported by CLI.convert
Thanks.
Your Environment
- spaCy version: 2.2.4
- Platform: Linux-4.19.104±x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.6.9
- Models: en, es
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
How to train a NER model using spaCy 3 only, starting from ...
without the need of using prodigy, just spaCy 3; handing to them JSON/JSONL as "raw" training data, rather than binary .spacy files.
Read more >Data formats · spaCy API Documentation
This section documents input and output formats of data used by spaCy, including the training config, training data and lexical vocabulary data.
Read more >Creates NER training data in Spacy format from JSON ...
Creates NER training data in Spacy format from JSON downloaded from Dataturks. ... Run: python Dataturks_to_Spacy.py <dataturks_JSON_FilePath> ...
Read more >spacy training data to be used in Python - moved from JSON
spacy format using the convert command line. The problem I am facing now is that the old code: for text, annotations in TEST_DATA:...
Read more >Prepare training data and train custom NER using Spacy Python
Prepare Spacy formatted training data for custom NER ####### import json # Read output json file from WebAnno (Annotation tool) with open('input_json.json') ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
You’re right that this has been lacking. For spaCy v.3, we’re working on an overhaul of the
convert
function and the data formats in general, which should hopefully make all of this more intuitive!Here’s my stackoverflow answer on how to do this: https://stackoverflow.com/a/59209377/461847
It would probably make sense to add an example script to do this, since this is the main missing step for people who want to move from the super simple example training scripts to real training with the train CLI.