Output PDF is getting distorted on each ocrmypdf command.
See original GitHub issueHi,
Please see the attached image where it shows the output PDF is getting distorted on each ocrmypdf command.
FYI, we are using auto-rotate options (–rotate-pages --rotate-pages-threshold 1) only for 1st version and for the rest versions PDF, we are not using the auto-rotate option.
sudo ocrmypdf --verbose 1 --force-ocr -l eng --output-type pdf --rotate-pages --rotate-pages-threshold 1 v_1.0.pdf v_1.1.pdf
sudo ocrmypdf --verbose 1 --force-ocr -l eng --output-type pdf v_1.1.pdf v_1.2.pdf
sudo ocrmypdf --verbose 1 --force-ocr -l eng --output-type pdf v_1.2.pdf v_1.3.pdf
sudo ocrmypdf --verbose 1 --force-ocr -l eng --output-type pdf v_1.3.pdf v_1.4.pdf
NOTE: OCRMyPDF version: 7.0.0
Could you please help me on this?
Also, if I add –oversample 600 option to command in each version, it works fine but output pdf size has increased.
sudo ocrmypdf --verbose 1 --force-ocr -l eng --output-type pdf --oversample 600 --rotate-pages --rotate-pages-threshold 1 v_2.0.pdf v_2.1.pdf
sudo ocrmypdf --verbose 1 --force-ocr -l eng --output-type pdf --oversample 600 v_2.1.pdf v_2.2.pdf
sudo ocrmypdf --verbose 1 --force-ocr -l eng --output-type pdf --oversample 600 v_2.2.pdf v_2.3.pdf
sudo ocrmypdf --verbose 1 --force-ocr -l eng --output-type pdf --oversample 600 v_2.3.pdf v_2.4.pdf
Thanks.
Issue Analytics
- State:
- Created 5 years ago
- Comments:15
Top Results From Across the Web
Output PDF is getting distorted on each ocrmypdf command.
Hi, Please see the attached image where it shows the output PDF is getting distorted on each ocrmypdf command. FYI, we are using...
Read more >ocrmypdf Documentation - Read the Docs
Rasterize each page as an image, OCR the images, and combine the output into a PDF. This preserves the layout of each page,...
Read more >Advanced features - OCRmyPDF - Read the Docs
Some unpaper features cause multiple input or output files to be consumed or ... Then an image of each page is created with...
Read more >Release notes - OCRmyPDF - Read the Docs
Worked around a major regression in Ghostscript 9.56.0 where all OCR text is stripped out of the PDF. It simply removes all text,...
Read more >Release 9.8.1 James R. Barlow - ocrmypdf Documentation
1. Rasterize each page as an image, OCR the images, and combine the output into a PDF. This preserves the layout of each...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Use --optimize 0 and --output-type pdf to disable and decompression.
Image resolution never changes by default but recompression can occur.
On Sun., Apr. 26, 2020, 13:30 Laurent Meyer, notifications@github.com wrote:
Good evening,
I’m experiencing a similar problem but I have a conceptional question: why is OCRmyPDF changing the image output at all? I thought it would not be the case as I read it in the readme:
My case is the following: I have a long screenshot (webpage) that I cut in many pieces (via Pillow - loseless): after this operation the png is looking like this:
After that, I convert it in PDF and the output looks the following:
And then I OCRmyPDF the file:
subprocess.run(["ocrmypdf", "-l", "eng+deu+fra", "--threshold", "../pdfs/yourfile.pdf", "../pdfs/mvp.pdf"])
and I get some noise around the letters (it does the same without
threshold
):Also the size of the pdf went from 2.3MB to 812KB but I would have preferred no compression at all…
I’m I missing something?