Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ptr type="web" note detected

See original GitHub issue

I was training citation model and everything is correctly detected except the URL. this is an example of my data training:

<bibl> <author>Azaola, Elena</author> (<date>2009</date>). <title level="a">El comercio con el dolor y la esperanza. La extorsión telefónica en México</title>. <title level="j">URVIO, Revista Latinoamericana de Estudios de Seguridad</title>, <biblScope unit="volume"></biblScope>(<biblScope unit="issue" type="issue">6</biblScope>), <biblScope unit="page">115-122</biblScope>. <idno type="ISSN"> ISSN: 1390-3691</idno>. <ptr type="web">https://www.redalyc.org/articulo.oa?id=552656559008</ptr> </bibl> <bibl> <author>Trejo Nieto, Alejandra</author> (<date>2013</date>). <title level="a">Las economías de las zonas metropolitanas de México en los albores del siglo xxi</title>. <title level="j">Estudios Demográficos y Urbanos</title>, <biblScope unit="volume">28</biblScope>(<biblScope unit="issue" type="issue">3</biblScope>), <biblScope unit="page">545-591</biblScope>. <idno type="ISSN"> ISSN: 0186-7210</idno>. <ptr type="web">https://www.redalyc.org/articulo.oa?id=31230011001</ptr> </bibl>

Maybe I do something wrong but I can’t detect it

Issue Analytics

State:
Created 2 years ago
Comments:11 (5 by maintainers)

Top GitHub Comments

1reaction

kermitt2commented, Dec 8, 2021

Hello !

The encoding of the results follows the TEI, so URL are encoded like this by definition:

<ptr target="https://www.redalyc.org/articulo.oa?id=552656559008" />

<ptr> has no type, and target URL is defined by the @target attribute. Why do you think it is a problem?

Maybe I can stress that the encoding of the training data is different from the encoding of the final processed result. Grobid parsing results are metadata, so normalized and independent from a particular order/presentation/serialization. It’s the format expected by a catalogue for instance.

Training data follow the input (for instance noisy token sequences from a PDF) and thus are not normalized. As they follow exactly the input string, the encoding is “inline”, identifying spans to be extracted, so content is never in an attribute (XML attributes must be normalized to avoid XML failures).

To generate pre-annotated training data format, you can use the batch method createTraining, which produces inline annotations on the exact input reference strings.

0reactions

rodyoukaicommented, Dec 9, 2021

I understand, sorry, the english is not my native language and sometimes I have this issues in my comprehension, I will be retrain the model and check, thanks for your time and patience

Top Results From Across the Web

Payment Trace Request (PTR) - ACH - FRBservices.org

Required Fields. Information about Payment Trace Request (PTR) required field names. The investigation type that is being sent - PTR. The nine-digit routing ......

Bibliographical references - GROBID Documentation

Annotation guidelines for bibliographical references. Introduction. This section describes how to annotate training data for the citation model.

DNS in CyberSecurity world - LinkedIn

Basically when you type web address in your browser, ... PTR – Pointer, is translating from domain names into IP addresses (The opposite...

Map of function pointers to member functions - c++

Function pointers are an assembly level construct, exposed by C. The syntax reflects this - and getting a C++ class to conform to...

ASDM Book 1: Cisco ASA Series General Operations ASDM ...

When a Cisco IP Phone starts, if it does not have both the IP address and TFTP ... is passed to DHCP clients...