the correspondence about tags or labelsSee original GitHub issue
I am quite confused about the tags in training files and resulting tei files in prediction phase. Tags used for annotation seems not the same to the tagging label, they have a mapping relationship but I fail to find that. Such as in the tagginglabels file, there is
<section>, <paragraph> tags but in result files only
<head>, <p>can be found. I wonder where the transformation takes place? I’ve checked the saxparser and parser files, but I am still quite confused.
Thanks in advance.
- Created 5 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
Training data are in TEI XML (flat TEI preserving stream order of the PDF), there is a mapping between TEI XML tag and labels used by the models in the SAX parsers corresponding to the model in
Then the final results are serialized into a complex TEI XML (normalised order and more deeply structured). So there is also a mapping between the labels used by the models into this TEI, which is mainly done in the file TEIFormatter, and for substructures like date, person, etc. directly by the POJO classes under
Hope it makes thing clearer ! If not, don’t hesitate to ask more.