[BAD_INPUT_DATA] PDF to XML conversion failed with error code: 99
See original GitHub issueHi, When using grobid with windows 8.1, TEI, for any pdf file i get following error.
Error encountered while requesting the server.
[BAD_INPUT_DATA] PDF to XML conversion failed with error code: 99
Following are from log
Model path: D:\...\grobid-master\grobid-home\models\header\model.wapiti
[DEBUG] org.grobid.core.document.DocumentSource: start pdf2xml
[DEBUG] org.grobid.core.document.DocumentSource: Executing: [D:\...\grobid-master\grobid-home\pdf2xml\win-64\pdftoxml_server, -blocks, -noImageInline, -fullFontName, -noImage, -annotation, -l, 2, D:\...\grobid-master\grobid-home\tmp\origin1660742701096340959.pdf, D:\...\grobid-master\grobid-home\tmp\CSccvkYTdu.lxml]
[ERROR] org.grobid.core.process.ProcessPdf2Xml: pdftoxml process finished with error code: 99. [D:\...\grobid-master\grobid-home\pdf2xml\win-64\pdftoxml_server, -blocks, -noImageInline, -fullFontName, -noImage, -annotation, -l, 2, D:\...\grobid-master\grobid-home\tmp\origin1660742701096340959.pdf, D:\...\grobid-master\grobid-home\tmp\CSccvkYTdu.lxml]
[ERROR] org.grobid.core.process.ProcessPdf2Xml: pdftoxml return message:pdftoxml version 1.0
(Based on Xpdf version 3.01, Copyright 1996-2005 Glyph & Cog, LLC)
Copyright 2004-2006 XEROX XRCE
Usage: pdftoxml [options] <PDF-file> [<xml-file>]
-f <int> : first page to convert
-l <int> : last page to convert
-verbose : display pdf attributes
-noText : do not extract textual objects
-noImage : do not extract Images (Bitmap and Vectorial)
-noImageInline : do not include images inline in the stream
-outline : create an outline file xml
-annots : create an annotations file xml
-cutPages : cut all pages in separately files
-blocks : add blocks informations whithin the structure
-fullFontName : fonts names are not normalized
-nsURI <string> : add the specified namespace URI
-opw <string> : owner password (for encrypted files)
-upw <string> : user password (for encrypted files)
-q : don't print any messages or errors
-v : print copyright and version info
-h : print usage information
-help : print usage information
--help : print usage information
-? : print usage information
[ERROR] org.grobid.service.process.GrobidRestProcessFiles: An unexpected exception occured: org.grobid.core.exceptions.GrobidException: [BAD_INPUT_DATA] org.gro
bid.core.exceptions.GrobidException: [BAD_INPUT_DATA] PDF to XML conversion failed with error code: 99
[DEBUG] org.grobid.core.utilities.IOUtilities: Removing D:\...\grobid-master\grobid-home\tmp\origin1660742701096340959.pdf
[DEBUG] org.grobid.service.process.GrobidRestProcessFiles: << org.grobid.service.process.GrobidRestProcessFiles.methodLogOut
Here command have constructed {{-annotation}} as an argument, but if look at log trace {{-annots}} should be argument name. If i manually change command with {{-annots}} and try to run from windows command prompt it does able to convert pdf2xml.
I have checked code here and -annotation is as an argument, which is not configurable to change it to {{-annots}}, https://github.com/kermitt2/grobid/blob/master/grobid-core/src/main/java/org/grobid/core/document/DocumentSource.java#L81
Can you please suggest workaround or possible solution here ?
Issue Analytics
- State:
- Created 6 years ago
- Comments:7 (2 by maintainers)
Top Results From Across the Web
How do I fix it when I have a file that I edited and saved the ...
How do I fix it when I have a file that I edited and saved the PDF, but I am getting PDF Conversion...
Read more >Open XML Conversion failed - Microsoft Community
I'm receiving the following error in the Open XML Converter Log -> 'Conversion failed'. I'm attempting to convert a Word 2007 file that...
Read more >DFHPI1009 - IBM
The reason for the failure is due to a problem converting a value within the XML or JSON. The possible error codes associated...
Read more >Forms and Bar Code Card v3.1 Technical Reference - Lexmark
Supported Lexmark PCL 5 and PostScript bar codes. ... 99. Removal of leading FF and CR from the forms data. ... Appendix E:...
Read more >PDF to XML Conversion of Invoices, Orders and other ...
What I'm thinking is if there was a tool out there to convert those PDFs to real, formatted XML from a PDF business...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@vishnudas-raveendran grobid is not supported to work on Windows. Unfortunately three platforms are too many for us, I recommend you to run it using docker.
See some more information: https://grobid.readthedocs.io/en/latest/Troubleshooting/#windows-related-issues
Hi @lfoppiano , I cloned the latest Grobid today (v0.6.1). Faced with PDF to XML conversion error. Attaching grobid-service.log and console.log. When running from cloud-miner service it is working fine.
I need to run it locally. Grobid-service.log file: grobid-service.log
Console log on running batch from GrobidMain
Screenshot of running as service:
The result is same with PDF file from any source.
Let me know, if you need anymore info