Improve user experience for Windows 10
See original GitHub issueHi
Describe the issue I’ve managed to run OCRmyPDF.exe on Windows 10 without wsl.
To Reproduce I’ve made fork and added some quick fixes in this commit: https://github.com/dibu28/OCRmyPDF/commit/543088e79e8649e968d02d8fd268123255607dc1
Fixes are:
- in leptonica.py librray name is liblept-5 instead of lept
- in ghostscript.py 2.1) executable name is gswin64c.exe instead of gs 2.2) NamedTemporaryFile doesnt work properly and gs could not modify tmp file with access denied error. (so as a temporary workaround I’m adding “_1” to temp file name and then removing file. There could be some better way)
- in _pipeline.py and helpers.py files - symlinking to temp folder on windows requires Admin privelegies. So instead of simlinking I’m just copying files.
- in _sync.py file - os.path.samefile is returning error: “OSError: [WinError 1] Incorrect function: ‘nul’”
So after those changes and installin dependencies it started to work from command line like this: OCRmyPDF.exe input.pdf output.pdf
Dependencies and binaries I’m using: https://www.python.org/ftp/python/3.7.5/python-3.7.5-amd64.exe https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w64-setup-v5.0.0-alpha.20191030.exe https://github.com/ArtifexSoftware/ghostpdl-downloads/releases/download/gs950/gs950w64.exe https://github.com/qpdf/qpdf/releases/download/release-qpdf-9.0.2/qpdf-9.0.2-bin-msvc64.zip
Add paths to PATH variable: set PATH=%PATH%;C:\Program Files\Tesseract-OCR; set PATH=%PATH%;C:\Program Files\gs\gs9.50\bin; set PATH=%PATH%;C:\qpdf\qpdf-9.0.2-bin-msvc64\qpdf-9.0.2\bin;
python setup.py build
OCRmyPDF.exe input.pdf output.pdf
Expected behavior Can we add some workarounds using conditions based on os type?
System:
- OS: Windows 10
- OCRmyPDF Version: v9.0.5
Additional context
Issue Analytics
- State:
- Created 4 years ago
- Comments:57
Top GitHub Comments
import ocrmypdf Traceback (most recent call last):
File “<ipython-input-1-a81f3474d7ad>”, line 1, in <module> import ocrmypdf
File “C:\Users\22252\AppData\Roaming\Python\Python38\site-packages\ocrmypdf_init_.py”, line 10, in <module> from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo
File “C:\Users\22252\AppData\Roaming\Python\Python38\site-packages\ocrmypdf\leptonica.py”, line 62, in <module> lept = ffi.dlopen(_libpath) OSError: cannot load library ‘D:\OCR\Tesseract-OCR\liblept-5.dll’: error 0x7f
Please let me know how to fix this ??
The first step will be for ocrmypdf to check in reasonable locations for Tesseract and GS, examining the registry or whatever, so PATH becomes an override.
I don’t believe I can bundle the GS installer unless I change OCRmyPDF to AGPL, and I’m not sure I want to do that. I believe everything else could be bundled.
As far as actually doing a Windows installer, bundling, or setting up a choco package, I am hoping the community will step up, because I haven’t done made a Windows installer before or tried to package a Python application for Windows, and other people probably know how to get this off the ground faster than I can even if I end up finishing it. I converted to Azure Pipelines for its better Windows support, so that ideally we can test and deploy for every distribution type in one shot.
ocrmypdf is a unique/more complex case in its use of Leptonica (ABI level binding to a C library) and relies on calls to third party non-Python binaries. It will probably be necessary to spin off Leptonica into a separate package that gets compiled as a binary wheel, something I’ve already started work on actually. That means installer-generator programs that try to inspect the source code for dependencies are probably going to fail, because usually look for Python-only dependencies.