Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Timeout/Error when processing an MDPI PDF

See original GitHub issue

Hello!

I successfully deployed a grobid-server and I was parsing some articles. Everything was smooth until I found this paper: https://www.mdpi.com/2076-3921/11/2/251

I downloaded its PDF and tried to parse it with the following command: curl -v --form input=@./mdpi_article.pdf localhost:8070/api/processFulltextDocument And I get an XML with the following content: [TIMEOUT] PDF to XML conversion timed out

When trying instead the processHeaderDocument command, everything works as expected and the article headers (title, abstract, etc.) gets parsed in a good way. curl -v --form input=@./mdpi_article.pdf localhost:8070/api/processHeaderDocument

This is the error I got:

ERROR [2022-04-06 14:07:25,987] org.grobid.core.process.ProcessPdfToXml: pdfalto process finished with error code: 143. [/opt/grobid/grobid-home/pdfalto/lin-64/pdfalto_server, -fullFontName, -noLineNumbers, -noImage, -annotation, -filesLimit, 2000, /opt/grobid/grobid-home/tmp/origin3229937128031954158.pdf, /opt/grobid/grobid-home/tmp/4wDQxkvfeZ.lxml]
ERROR [2022-04-06 14:07:25,987] org.grobid.core.process.ProcessPdfToXml: pdfalto return message: 

ERROR [2022-04-06 14:07:25,988] org.grobid.service.process.GrobidRestProcessFiles: An unexpected exception occurs.

You mention here that this error can be a bit of anything. So let me know if you need more data for replicating the error. The server config is set on 10 threads and a timeout of 120 seconds, though I get this “timeout error” after 20 sec.

Issue Analytics

State:
Created a year ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

mazzespazzecommented, Apr 25, 2022

@kermitt2 I can confirm that upgrading to version 0.7.1 solved the issue. In a parallel universe, it would be really nice to know what was it and how it was fixed. But, I am more than content with this being solved!

Thanks for the tip and sorry for the delay, a lot of stuff happened during Easter holidays that had priority.

0reactions

kermitt2commented, Apr 25, 2022

@mazzespazze thanks for the feedback and good that this PDF is working now too ! To be honest I cannot immediately point to one of the fixes we made in the last months that has solved this problem, this is actually quite time consuming to track the problems back to the PDF parsing - which is likely where the trouble was taking place - so relatd to pdfalto or the interface with pdfalto.

Top Results From Across the Web

Article Processing Charges (APC) Information and FAQ - MDPI

Authors pay a one-time Article Processing Charge (APC) to cover the costs of peer review administration and management, professional production of articles in ......

The Editorial Process - MDPI

The Editor-in-Chief is responsible for the academic quality of the publication process, including acceptance decisions, approval of Guest Editors and Special ...

MDPI Submission Process: Your Questions Answered

To ensure that the submission process is as smooth as possible for our authors, we've assembled a list of commonly asked questions.

Processes | Instructions for Authors - MDPI

Any prior submissions of the manuscript to MDPI journals must be acknowledged. If this is the case, it is strongly recommended that the...

Processes | An Open Access Journal from MDPI

The article presents the distinctive features of the use of titanium–magnesium catalysts in the process of [...] Read more.