question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BAD_INPUT_DATA] PDF to XML conversion failed with error code: 99

See original GitHub issue

Hi, When using grobid with windows 8.1, TEI, for any pdf file i get following error.

Error encountered while requesting the server.
[BAD_INPUT_DATA] PDF to XML conversion failed with error code: 99

Following are from log

Model path: D:\...\grobid-master\grobid-home\models\header\model.wapiti
[DEBUG] org.grobid.core.document.DocumentSource: start pdf2xml
[DEBUG] org.grobid.core.document.DocumentSource: Executing: [D:\...\grobid-master\grobid-home\pdf2xml\win-64\pdftoxml_server, -blocks, -noImageInline, -fullFontName, -noImage, -annotation, -l, 2, D:\...\grobid-master\grobid-home\tmp\origin1660742701096340959.pdf, D:\...\grobid-master\grobid-home\tmp\CSccvkYTdu.lxml]
[ERROR] org.grobid.core.process.ProcessPdf2Xml: pdftoxml process finished with error code: 99. [D:\...\grobid-master\grobid-home\pdf2xml\win-64\pdftoxml_server, -blocks, -noImageInline, -fullFontName, -noImage, -annotation, -l, 2, D:\...\grobid-master\grobid-home\tmp\origin1660742701096340959.pdf, D:\...\grobid-master\grobid-home\tmp\CSccvkYTdu.lxml]
[ERROR] org.grobid.core.process.ProcessPdf2Xml: pdftoxml return message:pdftoxml version 1.0
(Based on Xpdf version 3.01, Copyright 1996-2005 Glyph & Cog, LLC)
Copyright 2004-2006 XEROX XRCE
Usage: pdftoxml [options] <PDF-file> [<xml-file>]
  -f <int>               : first page to convert
  -l <int>               : last page to convert
  -verbose               : display pdf attributes
  -noText                : do not extract textual objects
  -noImage               : do not extract Images (Bitmap and Vectorial)
  -noImageInline         : do not include images inline in the stream
  -outline               : create an outline file xml
  -annots                : create an annotations file xml
  -cutPages              : cut all pages in separately files
  -blocks                : add blocks informations whithin the structure
  -fullFontName          : fonts names are not normalized
  -nsURI <string>        : add the specified namespace URI
  -opw <string>          : owner password (for encrypted files)
  -upw <string>          : user password (for encrypted files)
  -q                     : don't print any messages or errors
  -v                     : print copyright and version info
  -h                     : print usage information
  -help                  : print usage information
  --help                 : print usage information
  -?                     : print usage information

[ERROR] org.grobid.service.process.GrobidRestProcessFiles: An unexpected exception occured: org.grobid.core.exceptions.GrobidException: [BAD_INPUT_DATA] org.gro
bid.core.exceptions.GrobidException: [BAD_INPUT_DATA] PDF to XML conversion failed with error code: 99
[DEBUG] org.grobid.core.utilities.IOUtilities: Removing D:\...\grobid-master\grobid-home\tmp\origin1660742701096340959.pdf
[DEBUG] org.grobid.service.process.GrobidRestProcessFiles: << org.grobid.service.process.GrobidRestProcessFiles.methodLogOut

Here command have constructed {{-annotation}} as an argument, but if look at log trace {{-annots}} should be argument name. If i manually change command with {{-annots}} and try to run from windows command prompt it does able to convert pdf2xml.

I have checked code here and -annotation is as an argument, which is not configurable to change it to {{-annots}}, https://github.com/kermitt2/grobid/blob/master/grobid-core/src/main/java/org/grobid/core/document/DocumentSource.java#L81

Can you please suggest workaround or possible solution here ?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
lfoppianocommented, Sep 25, 2020

@vishnudas-raveendran grobid is not supported to work on Windows. Unfortunately three platforms are too many for us, I recommend you to run it using docker.

See some more information: https://grobid.readthedocs.io/en/latest/Troubleshooting/#windows-related-issues

0reactions
vishnudas-raveendrancommented, Sep 25, 2020

Hi @lfoppiano , I cloned the latest Grobid today (v0.6.1). Faced with PDF to XML conversion error. Attaching grobid-service.log and console.log. When running from cloud-miner service it is working fine.

I need to run it locally. Grobid-service.log file: grobid-service.log

Console log on running batch from GrobidMain console_Err

Screenshot of running as service: 99

The result is same with PDF file from any source.

Let me know, if you need anymore info

Read more comments on GitHub >

github_iconTop Results From Across the Web

How do I fix it when I have a file that I edited and saved the ...
How do I fix it when I have a file that I edited and saved the PDF, but I am getting PDF Conversion...
Read more >
Open XML Conversion failed - Microsoft Community
I'm receiving the following error in the Open XML Converter Log -> 'Conversion failed'. I'm attempting to convert a Word 2007 file that...
Read more >
DFHPI1009 - IBM
The reason for the failure is due to a problem converting a value within the XML or JSON. The possible error codes associated...
Read more >
Forms and Bar Code Card v3.1 Technical Reference - Lexmark
Supported Lexmark PCL 5 and PostScript bar codes. ... 99. Removal of leading FF and CR from the forms data. ... Appendix E:...
Read more >
PDF to XML Conversion of Invoices, Orders and other ...
What I'm thinking is if there was a tool out there to convert those PDFs to real, formatted XML from a PDF business...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found