file size increase for pdf/a
See original GitHub issueOCRmyPDF is really marvelous! Thanks!
I have one question regarding output file size: Unless explicitly selecting pdf as output type, I have quite large file sizes (~4x) after “ocrmypdf in.pdf out.pdf”. The pages are scanned text, i.e. actually there are no gray pixels only black or white ones. Only “–output-type pdf” keeps the file size similar.
For the first page (the others are similar) “pdfimages -list in.pdf” gives:
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 2697 4533 icc 1 1 ccitt no 17 0 600 601 88.3K 5.9%
out.pdf results in:
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 2697 4533 rgb 3 8 image no 12 0 600 601 385K 1.1%
Even --optimize 3 results in double file size for out.pdf (saved as pdf/a):
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 2697 4533 gray 1 1 image no 34 0 600 601 203K 14%
Is a conversion obligatory for pdf/a? Or is there a way to keep the original image type AND generate pdf/a?
Issue Analytics
- State:
- Created 5 years ago
- Comments:9
Top Results From Across the Web
Resize PDF - Change PDF Page Size/Margins Online Free
Select a PDF file to resize: upload the file from your computer or cloud storage service like Google Drive or Dropbox. Or, you...
Read more >Change PDF page size - Resize your PDF pages online
First: Upload your file from your computer or a cloud or drag and drop it into the field above. Then: Choose the aspect...
Read more >How to Increase PDF Size Quickly - Wondershare PDFelement
PDF editors like PDFelement can easily resize PDF to A4. Open the PDF file, click "Page" > "Page Boxes" and select "Change Page...
Read more >PDF Resizer - PDF Tools
PDF resizer is a simple, free online tool for PDF document resizing and compressing to save disk space, bandwidth and computer memory. Reduce...
Read more >Why is my PDF file so big? - Adobe
PDFs are usually noticeably large when a few specific things happen. First, PDFs can be oversized because one or more fonts have been...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thank you for the explanation. But one thing I don’t understand:
Did you mean “without” instead of “with”? With “-output-type pdf” I get small outfile sizes. Therefore, I think this option makes the original PDF images the final PDF.
I would think that most of the scanned PDFs contain old documents, many of which are only b/w. That is, I don’t think b/w is such a rare special case, don’t you? What do you think of an option to convert gray images to b/w for output? Gray images would be better for tesseract (?) and b/w output would be better for reading and for file size. This would be the same reasoning as for --clean-final, but --clean-final doesn’t convert to monochrome. (Apart from the programming effort, of course …).
very good idea 😃
I know this might be not the right place, but I didn’t want to create a new “issue”.
Could you please tell a bit more about your workflow? I have documents which contains just black text with a small color graphic (corporate logo). Additionally one site has signatures which are made with a blue pencil. What would be the best way to scan and process this kind of documents (contracts)? I don’t want to lose the color information.
Currently I’m scanning it as color text with 600dpi (using VueScan). After passing it through ocrmypdf the size did not reduced much (~700KB). When using the jbig2-lossy compression 3 the size was halved (14MB -> 7MB for an 8 pages document). I’m perfectly fine with the size in case of storing it locally.
However, sometimes I want to send this kind of document per e-mail and in this case even the 7MB is not optimal. I would like to convert it to b/w as it does not matter if the corporate logo is red-blue or just black. How would that be harmful if there would be a save/convert as b/w option?
Maybe I’m missing something here and there is a different way to reduce the file size?
Thanks for the excellent app!