question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ptr type="web" note detected

See original GitHub issue

Hi

I was training citation model and everything is correctly detected except the URL. this is an example of my data training:

<bibl> <author>Azaola, Elena</author> (<date>2009</date>). <title level="a">El comercio con el dolor y la esperanza. La extorsión telefónica en México</title>. <title level="j">URVIO, Revista Latinoamericana de Estudios de Seguridad</title>, <biblScope unit="volume"></biblScope>(<biblScope unit="issue" type="issue">6</biblScope>), <biblScope unit="page">115-122</biblScope>. <idno type="ISSN"> ISSN: 1390-3691</idno>. <ptr type="web">https://www.redalyc.org/articulo.oa?id=552656559008</ptr> </bibl> <bibl> <author>Trejo Nieto, Alejandra</author> (<date>2013</date>). <title level="a">Las economías de las zonas metropolitanas de México en los albores del siglo xxi</title>. <title level="j">Estudios Demográficos y Urbanos</title>, <biblScope unit="volume">28</biblScope>(<biblScope unit="issue" type="issue">3</biblScope>), <biblScope unit="page">545-591</biblScope>. <idno type="ISSN"> ISSN: 0186-7210</idno>. <ptr type="web">https://www.redalyc.org/articulo.oa?id=31230011001</ptr> </bibl>

Maybe I do something wrong but I can’t detect it

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
kermitt2commented, Dec 8, 2021

Hello !

The encoding of the results follows the TEI, so URL are encoded like this by definition:

<ptr target="https://www.redalyc.org/articulo.oa?id=552656559008" /> 

<ptr> has no type, and target URL is defined by the @target attribute. Why do you think it is a problem?

Maybe I can stress that the encoding of the training data is different from the encoding of the final processed result. Grobid parsing results are metadata, so normalized and independent from a particular order/presentation/serialization. It’s the format expected by a catalogue for instance.

Training data follow the input (for instance noisy token sequences from a PDF) and thus are not normalized. As they follow exactly the input string, the encoding is “inline”, identifying spans to be extracted, so content is never in an attribute (XML attributes must be normalized to avoid XML failures).

To generate pre-annotated training data format, you can use the batch method createTraining, which produces inline annotations on the exact input reference strings.

0reactions
rodyoukaicommented, Dec 9, 2021

I understand, sorry, the english is not my native language and sometimes I have this issues in my comprehension, I will be retrain the model and check, thanks for your time and patience

Read more comments on GitHub >

github_iconTop Results From Across the Web

Payment Trace Request (PTR) - ACH - FRBservices.org
Required Fields. Information about Payment Trace Request (PTR) required field names. The investigation type that is being sent - PTR. The nine-digit routing ......
Read more >
Bibliographical references - GROBID Documentation
Annotation guidelines for bibliographical references. Introduction. This section describes how to annotate training data for the citation model.
Read more >
DNS in CyberSecurity world - LinkedIn
Basically when you type web address in your browser, ... PTR – Pointer, is translating from domain names into IP addresses (The opposite...
Read more >
Map of function pointers to member functions - c++
Function pointers are an assembly level construct, exposed by C. The syntax reflects this - and getting a C++ class to conform to...
Read more >
ASDM Book 1: Cisco ASA Series General Operations ASDM ...
When a Cisco IP Phone starts, if it does not have both the IP address and TFTP ... is passed to DHCP clients...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found