question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DecompressionBombError: Image size exceeds limit of x pixels, could be decompression bomb DOS attack

See original GitHub issue

Describe the issue

Forcing an OCR results in PIL.Image.DecompressionBombError

To Reproduce

$ ocrmypdf -v -l deu+eng --force-ocr --sidecar out.txt  /path/to/input.pdf out.pdf
  DEBUG - ocrmypdf 9.0.0
  DEBUG - os.symlink(/path/to/input.pdf, /var/folders/pf/7vhqx5bn41qddypw08w9jc4w0000gn/T/com.github.ocrmypdf.civmiyes/origin)
  DEBUG - os.symlink(/var/folders/pf/7vhqx5bn41qddypw08w9jc4w0000gn/T/com.github.ocrmypdf.civmiyes/origin, /var/folders/pf/7vhqx5bn41qddypw08w9jc4w0000gn/T/com.github.ocrmypdf.civmiyes/origin.pdf)
Scan: 100%|...| 1/1 [00:00<00:00, 59.04page/s]
   INFO - Using Tesseract OpenMP thread limit 4
   INFO -    1: page already has text! - rasterizing text and running OCR anyway                                                                                                 
  DEBUG -    1: Rasterize with png16m                                                                                                                                            
  DEBUG -    1: ['gs', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-sDEVICE=png16m', '-dFirstPage=1', '-dLastPage=1', '-r2897.060413x2897.060413', '-o', '/var/folders/pf/7vhqx5bn41qddypw08w9jc4w0000gn/T/tmpeam5vcbf', '-dAutoRotatePages=/None', '-f', '/var/folders/pf/7vhqx5bn41qddypw08w9jc4w0000gn/T/com.github.ocrmypdf.civmiyes/origin.pdf']
OCR:   0%|                                                                                                                                           | 0.0/1.0 [00:00<?, ?page/s]
  ERROR - An exception occurred while executing the pipeline
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/Cellar/ocrmypdf/9.0.0/libexec/lib/python3.7/site-packages/ocrmypdf/_sync.py", line 100, in exec_page_sync
    remove_vectors=False,
  File "/usr/local/Cellar/ocrmypdf/9.0.0/libexec/lib/python3.7/site-packages/ocrmypdf/_pipeline.py", line 446, in rasterize
    filter_vector=remove_vectors,
  File "/usr/local/Cellar/ocrmypdf/9.0.0/libexec/lib/python3.7/site-packages/ocrmypdf/exec/ghostscript.py", line 175, in rasterize_pdf
    with Image.open(tmp) as im:
  File "/usr/local/Cellar/ocrmypdf/9.0.0/libexec/lib/python3.7/site-packages/PIL/Image.py", line 2690, in open
    im = _open_core(fp, filename, prefix)
  File "/usr/local/Cellar/ocrmypdf/9.0.0/libexec/lib/python3.7/site-packages/PIL/Image.py", line 2677, in _open_core
    _decompression_bomb_check(im.size)
  File "/usr/local/Cellar/ocrmypdf/9.0.0/libexec/lib/python3.7/site-packages/PIL/Image.py", line 2611, in _decompression_bomb_check
    (pixels, 2 * MAX_IMAGE_PIXELS))
PIL.Image.DecompressionBombError: Image size (811374000 pixels) exceeds limit of 256000000 pixels, could be decompression bomb DOS attack.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/Cellar/ocrmypdf/9.0.0/libexec/lib/python3.7/site-packages/ocrmypdf/_sync.py", line 338, in run_pipeline
    exec_concurrent(context)
  File "/usr/local/Cellar/ocrmypdf/9.0.0/libexec/lib/python3.7/site-packages/ocrmypdf/_sync.py", line 269, in exec_concurrent
    page_result = results.next()
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 748, in next
    raise value
PIL.Image.DecompressionBombError: Image size (811374000 pixels) exceeds limit of 256000000 pixels, could be decompression bomb DOS attack.

Example file

The file contains personal information. Happy to provide the encrypted version though.

Expected behavior

It worked for many similar PDFs. I would have expected this to just work the same way.

System:

  • OS: macOS 10.13.6
  • OCRmyPDF Version: 9.0.0

Additional context

This is not a scanned but rather a generated PDF.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:11

github_iconTop GitHub Comments

2reactions
jbarlow83commented, Aug 9, 2019

--image-dpi has no effect except when the input file is an image. There’s supposed to be a warning message that explains this (although maybe it should be an error).

You can use Ghostscript to downsample:

$ gs -q -sDEVICE=pdfwrite  -dDownsampleColorImages=true -dDownsampleGrayImages=true -o out.pdf tests/resources/2400dpi.pdf
$ pdfimages -list out.pdf
page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
   1     0 image     250   179  gray    1   8  image  no         8  0   150   150 2491B 5.6%

In its documentation there are commands to fine tune the resolution.

Since Ghostscript can do this, I don’t think ocrmypdf should implement the same feature.

1reaction
mohsin127commented, Jan 24, 2020

Solved Open File “/usr/local/Cellar/ocrmypdf/9.0.0/libexec/lib/python3.7/site-packages/PIL/Image.py” and Change MAX_IMAGE_PIXELS = How much you want.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pillow in Python won't let me open image ("exceeds limit")
PIL.Image.DecompressionBombError: Image size (933120000 pixels) exceeds limit of 178956970 pixels, could be decompression bomb DOS attack.
Read more >
PIL.Image.DecompressionBombError · Issue #76 - GitHub
Image.DecompressionBombError: Image size (280835475 pixels) exceeds limit of 178956970 pixels, could be decompression bomb DOS attack.
Read more >
PIL.Image.DecompressionBombError: Image size exceeds ...
PIL.Image.DecompressionBombError: Image size (449925840 pixels) exceeds limit of 178956970 pixels, could be decompression bomb DOS attack.
Read more >
Image size (329910267 pixels) exceeds limit of 178956970 ...
ERROR:root:error: Image size (329910267 pixels) exceeds limit of 178956970 pixels, could be decompression bomb DOS attack. Any ideas how to overcome this ...
Read more >
opencv-cvat/public - Gitter
DecompressionBombError : Image size (211,680,000) exceeds limit of 178,956,970 pixels, could be decompression bomb DOS attack... ". So how can I increase the ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found