Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Output PDF is getting distorted on each ocrmypdf command.

See original GitHub issue

Hi,

Please see the attached image where it shows the output PDF is getting distorted on each ocrmypdf command.

distorted_from_v1 0_to_v1 4

FYI, we are using auto-rotate options (–rotate-pages --rotate-pages-threshold 1) only for 1st version and for the rest versions PDF, we are not using the auto-rotate option.

sudo ocrmypdf --verbose 1 --force-ocr -l eng --output-type pdf --rotate-pages --rotate-pages-threshold 1 v_1.0.pdf v_1.1.pdf

sudo ocrmypdf --verbose 1 --force-ocr -l eng --output-type pdf v_1.1.pdf v_1.2.pdf

sudo ocrmypdf --verbose 1 --force-ocr -l eng --output-type pdf v_1.2.pdf v_1.3.pdf

sudo ocrmypdf --verbose 1 --force-ocr -l eng --output-type pdf v_1.3.pdf v_1.4.pdf

NOTE: OCRMyPDF version: 7.0.0

Could you please help me on this?

Also, if I add –oversample 600 option to command in each version, it works fine but output pdf size has increased.

sudo ocrmypdf --verbose 1 --force-ocr -l eng --output-type pdf --oversample 600 --rotate-pages --rotate-pages-threshold 1 v_2.0.pdf v_2.1.pdf

sudo ocrmypdf --verbose 1 --force-ocr -l eng --output-type pdf --oversample 600 v_2.1.pdf v_2.2.pdf

sudo ocrmypdf --verbose 1 --force-ocr -l eng --output-type pdf --oversample 600 v_2.2.pdf v_2.3.pdf

sudo ocrmypdf --verbose 1 --force-ocr -l eng --output-type pdf --oversample 600 v_2.3.pdf v_2.4.pdf

Thanks.

Issue Analytics

State:
Created 5 years ago
Comments:15

Top GitHub Comments

1reaction

jbarlow83commented, Apr 26, 2020

Use --optimize 0 and --output-type pdf to disable and decompression.

Image resolution never changes by default but recompression can occur.

On Sun., Apr. 26, 2020, 13:30 Laurent Meyer, notifications@github.com wrote:

Good evening,

I’m experiencing a similar problem but I have a conceptional question: why is OCRmyPDF changing the image output at all? I thought it would not be the case as I read it in the readme:

Keeps the exact resolution of the original embedded images

My case is the following: I have a long screenshot (webpage) that I cut in many pieces (via Pillow - loseless): after this operation the png is looking like this:

[image: image] https://user-images.githubusercontent.com/5024077/80318713-511c0f80-880c-11ea-8d0b-c30c1c887bde.png

After that, I convert it in PDF and the output looks the following:

[image: image] https://user-images.githubusercontent.com/5024077/80318734-698c2a00-880c-11ea-8a6e-440c6593b79b.png

And then I OCRmyPDF the file:

subprocess.run([“ocrmypdf”, “-l”, “eng+deu+fra”, “–threshold”, “…/pdfs/yourfile.pdf”, “…/pdfs/mvp.pdf”])

and I get some noise around the letters (it does the same without threshold):

[image: image] https://user-images.githubusercontent.com/5024077/80318756-97716e80-880c-11ea-9029-bb357fb3e672.png

Also the size of the pdf went from 2.3MB to 812KB but I would have preferred no compression at all…

I’m I missing something?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jbarlow83/OCRmyPDF/issues/316#issuecomment-619620464, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN5YMYNRV3JPOPYYQMPC43ROSKXLANCNFSM4GGJNCTA .

0reactions

lolobossecommented, Apr 26, 2020

Good evening,

I’m experiencing a similar problem but I have a conceptional question: why is OCRmyPDF changing the image output at all? I thought it would not be the case as I read it in the readme:

Keeps the exact resolution of the original embedded images

My case is the following: I have a long screenshot (webpage) that I cut in many pieces (via Pillow - loseless): after this operation the png is looking like this: