question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

EXIF parsing causing memory explosion

See original GitHub issue

Thumbor request URL

The request url for the issue you are having, you can swap the host name with a fake http://thumbor-host.com/unsafe/1400x0/filters:no_upscale()/file-host.com/GettyImages_539746540.jpg

Here is the image with corrupted/invalid EXIF data used in the example request GettyImages_539746540.jpg.zip

Expected behaviour

Tell us what should happen An image should be returned

Actual behaviour

Tell us what happens instead No response is received, thumbor crashes due to running out of memory. Reverse proxy in front of thumbor may respond with a timeout code if that happens first.

Operating system

OSx, Ubuntu, etc Ubuntu

Your thumbor.conf

## IMAGE PROCESSING ##
ENGINE = 'thumbor.engines.pil'

DETECTORS = [
    'thumbor.detectors.face_detector',
    'thumbor.detectors.feature_detector'
]

FILTERS = [
    'thumbor.filters.blur',
    'thumbor.filters.colorize',
    'thumbor.filters.extract_focal',
    'thumbor.filters.format',
    'thumbor.filters.focal',
    'thumbor.filters.no_upscale',
    'thumbor.filters.quality',
    'thumbor.filters.saturation',
    'thumbor.filters.fill',
]

OPTIMIZERS = [
    'thumbor.optimizers.jpegtran',
    'thumbor.optimizers.gifv',
    'thumbor_plugins.optimizers.autojpeg',
]
JPEGTRAN_PATH = '/usr/bin/jpegtran'
FFMPEG_PATH = '/usr/bin/ffmpeg'

ALLOW_ANIMATED_GIFS = True
USE_GIFSICLE_ENGINE = True
PRESERVE_EXIF_INFO = False
AUTO_WEBP = True

QUALITY = 80
WEBP_QUALITY = 80

AUTOJPEG_QUALITY='90'
AUTOJPEG_SUBSAMPLING='0'

GC_INTERVAL=60

ENGINE_THREADPOOL_SIZE=12

I already know what is causing the issue. As documented here https://github.com/hMatoba/Piexif/issues/90 the piexif library has a bug which is causing the memory explosion. Both piexif and the older pexif libraries seem to have this bug.

We aren’t even using RESPECT_ORIENTATION with our cluster but I found that thumbor tries to parse the EXIF data regardless of that setting - but it won’t be used unless the setting is enabled. It’s probably a bad idea to parse the EXIF data if it’s not necessary - it takes time to parse that data. But it’s also not a good solution to disable EXIF parsing as a means of working around this problem (though that is exactly what we have done as an interim patch)

I am also a bit perplexed as to why we are even using the piexif library at all. The pillow library can very easily return the EXIF data. The example below doesn’t have orientation data but it would be in position 274 if it existed.

im = Image.open("GettyImages_539746540.jpg")
im._getexif()

/usr/local/lib/python2.7/site-packages/PIL/TiffImagePlugin.py:768: UserWarning: Possibly corrupt EXIF data.  Expecting to read 34225520648 bytes but only got 104. Skipping tag 33437
  " Skipping tag %s" % (size, len(data), tag))
/usr/local/lib/python2.7/site-packages/PIL/TiffImagePlugin.py:768: UserWarning: Possibly corrupt EXIF data.  Expecting to read 33685506 bytes but only got 0. Skipping tag 34850
  " Skipping tag %s" % (size, len(data), tag))
{36864: '0221', 37377: (9965784, 1000000), 37378: (4970854, 1000000), 36867: u'2008:02:21 14:18:14', 36868: u'2008:02:21 14:18:14', 37381: (3, 1), 41990: 0, 37383: 6, 37385: 16, 37386: (300, 1), 41986: 0, 270: u'Austin, TX February 21, 2008: Supporters of candidates Hillary Clinton and Barack Obama line up outside the Rec Sports Center at the Univeristy of Texas at Austin hours prior to the debate between the Democratic candidates Thursday.', 271: u'Canon', 272: u'Canon EOS 5D', 41987: 0, 33432: u'Bob Daemmrich Photography, Inc.', 37380: (-1, 3), 282: (300, 1), 283: (300, 1), 33434: (1, 1000), 34855: 160, 296: 2, 306: u'2008:02:21 14:46:13', 315: u'Bob Daemmrich', 41985: 0, 41486: (4368000, 1415), 41487: (2912000, 942), 41488: 2, 34665: 470}

^ As you can see, it encounters the same bad EXIF data but it doesn’t OOM any servers which is a nice feature : D

** I think we should

  1. Use pillow to extract EXIF orientation data (it’s no more cryptic than piexif)
  2. Remove piexif as a dependency
  3. Stop parsing EXIF data unless it might be used in some later operation (reorientation)

^ I’m happy to code this up if folks feel like that is a good solution. I’d like to know what people think.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

github_iconTop Results From Across the Web

it happened a memory leaks when I getting image exif ...
Following are the method an methods call tree that cause the memory leak //get the exif info of image asset background @autoreleasepool ...
Read more >
Memory issues for long-running parsing processes #3618
I'm suspecting a memory leak when using intensively nlp.pipe(), my process is growing in memory and it looks that it never garbage collect....
Read more >
How we fixed a Node.js memory leak by using ...
A problem we faced recently was a memory leak in our Node.js application. It confounded our engineering team as it was only occurring...
Read more >
Fixing Memory Exhaustion Bugs in My Golang Web App
Uploading two copies of the 618 MB version in parallel consistently caused PicoShare to die with an out of memory error within a...
Read more >
3 Troubleshoot Memory Leaks - Java - Oracle Help Center
A memory leak occurs when an application unintentionally holds references to Java objects or classes, preventing them from being garbage collected. These ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found