question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Header not returning reference

See original GitHub issue

I am running Grobid on Ubuntu 22.04.1 LTS using Windows Subsystem for Linux. Java verison:

java

For some reason I can’t manage to output reference data from the Header Model. I have tried running it in batch mode and by using Grobid service. I tried both processHeader and processFullText. Neither worked.

I used the createTraining command to generate data. Then I edited the header.tei.xml files and retrained the mode. When I use the new model to generate training data, then the generated training data (training.header file) is tagged correctly but the output tei.xml file doesn’t show that information. The reference within the header is also parsed correctly in the training.header.reference file.

Why aren’t the tags present in the output file even though they are in the training data? I am using the <reference> tag

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
kermitt2commented, Aug 22, 2022

Hi @kuubikus !

Indeed in the current version, the metadata of the parsed reference present in the header section are not injected in the result (only the metadata coming from the “consolidation” mechanism might be injected). More precisely this part is present but in comments, because it requires some review. So far I didn’t find the time to work on this part again 😕

If you want the header reference information injected in the resulting header TEI, you simply need to remove the comments for following lines:

https://github.com/kermitt2/grobid/blob/master/grobid-core/src/main/java/org/grobid/core/engines/HeaderParser.java#L246

/*if (resHeader.getReference() != null) {
      BiblioItem refer = parsers.getCitationParser().processingString(resHeader.getReference(), 0);
      BiblioItem.correct(resHeader, refer);
}*/
if (resHeader.getReference() != null) {
      BiblioItem refer = parsers.getCitationParser().processingString(resHeader.getReference(), 0);
      BiblioItem.correct(resHeader, refer);
}

The issue as it is now, is that it might overwrite some “original” fields extracted from the header with the values in the reference and there’s some decision that needs to be added to keep the most reliable values in case of conflicting metadata (so coming from the header itself and from the reference in the header).

0reactions
kuubikuscommented, Aug 29, 2022

I managed to make it work. First I changed the BiblioItem.java file to “correct” idno numbers even if it didn’t have a global level of acceptance. https://github.com/kermitt2/grobid/blob/0326b2872304a8de1be1e3583ae5811ec406c9f5/grobid-core/src/main/java/org/grobid/core/data/BiblioItem.java#L3963

if (bibo.getPubnum() != null)
            bib.setPubnum(bibo.getPubnum());

I then added these lines to the TEIformatter.java file. https://github.com/kermitt2/grobid/blob/0326b2872304a8de1be1e3583ae5811ec406c9f5/grobid-core/src/main/java/org/grobid/core/document/TEIFormatter.java#L312

if (idno != null) {
           tei.append("\t\t\t\t\t\t<idno>" + idno + "</idno>\n");
       }

Now the output file displays idno but only if I use Grobid Service. For some reason it doesn’t work with Batch Mode.

Read more comments on GitHub >

github_iconTop Results From Across the Web

reference to pointer not working on header files in C++
I edited I have the declaration on header file. I can compile .cc file without any issue. But header file gives me a...
Read more >
HTTP headers - MDN Web Docs - Mozilla
These headers are meaningful only for a single transport-level connection, and must not be retransmitted by proxies or cached.
Read more >
HTTP/1.1: Header Field Definitions
When a shared cache (see section 13.7) receives a request containing an Authorization field, it MUST NOT return the corresponding response as a...
Read more >
Return by reference in C++ with Examples - GeeksforGeeks
Functions behaves a very important role when variable or pointers are returned as reference. See this function signature of Return by Reference ......
Read more >
header - Manual - PHP
Remember that header() must be called before any actual output is sent, either by normal HTML tags, blank lines in a file, or...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found