question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Timeout/Error when processing an MDPI PDF

See original GitHub issue

Hello!

I successfully deployed a grobid-server and I was parsing some articles. Everything was smooth until I found this paper: https://www.mdpi.com/2076-3921/11/2/251

I downloaded its PDF and tried to parse it with the following command: curl -v --form input=@./mdpi_article.pdf localhost:8070/api/processFulltextDocument And I get an XML with the following content: [TIMEOUT] PDF to XML conversion timed out

When trying instead the processHeaderDocument command, everything works as expected and the article headers (title, abstract, etc.) gets parsed in a good way. curl -v --form input=@./mdpi_article.pdf localhost:8070/api/processHeaderDocument

This is the error I got:

ERROR [2022-04-06 14:07:25,987] org.grobid.core.process.ProcessPdfToXml: pdfalto process finished with error code: 143. [/opt/grobid/grobid-home/pdfalto/lin-64/pdfalto_server, -fullFontName, -noLineNumbers, -noImage, -annotation, -filesLimit, 2000, /opt/grobid/grobid-home/tmp/origin3229937128031954158.pdf, /opt/grobid/grobid-home/tmp/4wDQxkvfeZ.lxml]
ERROR [2022-04-06 14:07:25,987] org.grobid.core.process.ProcessPdfToXml: pdfalto return message: 

ERROR [2022-04-06 14:07:25,988] org.grobid.service.process.GrobidRestProcessFiles: An unexpected exception occurs. 

You mention here that this error can be a bit of anything. So let me know if you need more data for replicating the error. The server config is set on 10 threads and a timeout of 120 seconds, though I get this “timeout error” after 20 sec.

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
mazzespazzecommented, Apr 25, 2022

@kermitt2 I can confirm that upgrading to version 0.7.1 solved the issue. In a parallel universe, it would be really nice to know what was it and how it was fixed. But, I am more than content with this being solved!

Thanks for the tip and sorry for the delay, a lot of stuff happened during Easter holidays that had priority.

0reactions
kermitt2commented, Apr 25, 2022

@mazzespazze thanks for the feedback and good that this PDF is working now too ! To be honest I cannot immediately point to one of the fixes we made in the last months that has solved this problem, this is actually quite time consuming to track the problems back to the PDF parsing - which is likely where the trouble was taking place - so relatd to pdfalto or the interface with pdfalto.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Article Processing Charges (APC) Information and FAQ - MDPI
Authors pay a one-time Article Processing Charge (APC) to cover the costs of peer review administration and management, professional production of articles in ......
Read more >
The Editorial Process - MDPI
The Editor-in-Chief is responsible for the academic quality of the publication process, including acceptance decisions, approval of Guest Editors and Special ...
Read more >
MDPI Submission Process: Your Questions Answered
To ensure that the submission process is as smooth as possible for our authors, we've assembled a list of commonly asked questions.
Read more >
Processes | Instructions for Authors - MDPI
Any prior submissions of the manuscript to MDPI journals must be acknowledged. If this is the case, it is strongly recommended that the...
Read more >
Processes | An Open Access Journal from MDPI
The article presents the distinctive features of the use of titanium–magnesium catalysts in the process of [...] Read more.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found