question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

grobid will make mistake when a reference has square brackets.

See original GitHub issue

Paper:

Use of gasotransmitters for the controlled release of polymer-based nitric oxide carriers in medical applications - ScienceDirect

Reference:

image

TEI Result:

...
<biblStruct coords="13,326.72,602.79,229.86,5.73;13,326.72,610.79,230.88,5.73" xml:id="b160">
	<monogr>
		<title level="m" type="main">Click&quot; reactions for the N-terminal and side-chain functionalization of peptides with</title>
		<author>
			<persName coords=""><forename type="first">H</forename><surname>Pfeiffer</surname></persName>
		</author>
		<author>
			<persName coords=""><forename type="first">A</forename><surname>Rojas</surname></persName>
		</author>
		<author>
			<persName coords=""><forename type="first">J</forename><surname>Niesel</surname></persName>
		</author>
		<author>
			<persName coords=""><forename type="first">U</forename><surname>Schatzschneider</surname></persName>
		</author>
		<author>
			<persName coords=""><surname>Sonogashira</surname></persName>
		</author>
		<imprint>
			<pubPlace>Mn(CO)</pubPlace>
		</imprint>
	</monogr>
	<note type="raw_reference">H. Pfeiffer, A. Rojas, J. Niesel, U. Schatzschneider, Sonogashira and &quot;Click&quot; reac- tions for the N-terminal and side-chain functionalization of peptides with [Mn(CO)</note>
</biblStruct>
<biblStruct coords="13,350.04,618.72,207.57,5.73;13,326.72,626.72,74.63,5.73" xml:id="b161">
	<monogr>
		<title level="m" type="main">+-based CO releasing molecules (tpm = tris(pyrazolyl)methane)</title>
		<imprint>
			<date type="published" when="2009" />
			<biblScope unit="page" from="4292" to="4298" />
			<pubPlace>Dalton Trans</pubPlace>
		</imprint>
	</monogr>
	<note type="raw_reference">+-based CO releasing molecules (tpm = tris(pyrazolyl)methane), Dalton Trans. (2009) 4292-4298.</note>
</biblStruct>
...

Description:

Seems grobid will make mistake when a reference has square brackets.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
kermitt2commented, Dec 19, 2021

Hi @elonzh !

Unfortunately the two articles given as examples cannot be used as training data - first one is CC BY but non-derivative and the second one is closed access.

If you have the chance to find similar errors in one or two CC-BY articles, don’t hesitate to reference it here 😃

0reactions
elonzhcommented, Dec 27, 2021

Another error case:

Paper:

[math/0506081] The Dantzig selector: Statistical estimation when $p$ is much larger than $n$

Reference:

image


I detect reference errors by clustering alignments, and this algorithm is context-free and works great if the page is well-formatted.

Maybe we can integrate the algorithm into Grobid?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Retraining: Introduce new element tag for incremental training in ...
Hi, I am having question on bibliography reference incremental training in GROBID. Is it is possible to add extra newly named tags inside...
Read more >
Citation styles with numbers in brackets [5 examples] - BibGuru
Each reference must be numbered consecutively (in square brackets) in order of citation, and added to a list at the end of the...
Read more >
Machine Learning vs. Rules and Out-of-the-Box vs. Retrained
GROBID, created by Lopez [27], is another example of a. CRF-based system able to parse bibliographic references. GROBID is also a larger tool,...
Read more >
Hybrid extraction of in‐text patent‐to‐article citations
Combining hand‐tuned heuristics and the GROBID machine‐learning package, ... An advantage of rule‐based heuristics is that they can be ...
Read more >
The explanatory power of citations: a new approach to ...
Citation analysis has been applied to map the landscape of ... Doing away with this assumption could make such studies even more insightful....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found