question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

the correspondence about tags or labels

See original GitHub issue

Hi,

I am quite confused about the tags in training files and resulting tei files in prediction phase. Tags used for annotation seems not the same to the tagging label, they have a mapping relationship but I fail to find that. Such as in the tagginglabels file, there is <section>, <paragraph> tags but in result files only<head>, <p>can be found. I wonder where the transformation takes place? I’ve checked the saxparser and parser files, but I am still quite confused.

Thanks in advance.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
kermitt2commented, Dec 29, 2018

Hello !

Training data are in TEI XML (flat TEI preserving stream order of the PDF), there is a mapping between TEI XML tag and labels used by the models in the SAX parsers corresponding to the model in grobid-trainer

Then the final results are serialized into a complex TEI XML (normalised order and more deeply structured). So there is also a mapping between the labels used by the models into this TEI, which is mainly done in the file TEIFormatter, and for substructures like date, person, etc. directly by the POJO classes under org.grobid.core.data.

Hope it makes thing clearer ! If not, don’t hesitate to ask more.

0reactions
kermitt2commented, Jan 23, 2019

Hi again @Punchwes ! I am closing this issue which was about xml tag/crf label correspondence.

I open a separate one about level and number of section header, to keep track of improvement on this aspect.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Where can I find the correspondence between the tag name in ...
xml tag names are column names (not column labels). Column names can be viewed in the dictionary. Right click on the list header...
Read more >
Element: Correspondence Information - Journal Article Tag Suite
Information concerning how and with whom to correspond about an article. Remarks. A cross-reference element (<xref>) may point to this element's @id attribute ......
Read more >
SOLVED: A scale whose numbers serve only as labels or tags ...
A scale whose numbers serve only as labels or tags for identifying and classifying objects with a strict one-to-one correspondence between ...
Read more >
Correspondence Label, 4" x 2-58"
Purchase Correspondence Labels, 4" x 2-58" from our assortment of labels and stickers at United Ad Label.
Read more >
Creative Correspondence - labels and tags | This was a priva…
Creative Correspondence - labels and tags. This was a private swap. Our guideline was to NOT spend a dime but rather have fun...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found