Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

IOError: cannot identify image file <cStringIO.StringI object at 0x0000000005796E00>

See original GitHub issue

i got this problem when i use the ImagesPipeline to download some image（but some image can be download）…such as this image url:

http://mmbiz.qpic.cn/mmbiz_png/NmpHEAE5bXl1WYvWBOQBhDeB7Dar8JhIr4tHxouT9AbhZJO7wF1lv2bCUU1UX0YPtU70OUF7RRfnMQpubRJYpA/0?wx_fmt=png

this image can`t be download.

i think it is because the image type is JFIF，and the ImagesPipeline is not support . i dont know how to Solve this problem, please help me

Issue Analytics

State:
Created 6 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

cathalgarveycommented, Feb 15, 2018

Hi folks, this is marked as “not reproducible”, and it’s a bit stale, so I’m inclined to close it. However, if anyone finds a clearly reproducible example, please come back to us and discuss/reopen.

Something quick to consider: If you want images and you don’t need them resized/converted/thumbnailed, you can use the filespipeline to download them also. In fact, it should be faster and more efficient. This should be your first recourse if PIL/low - unsupported images become a problem.

Finally @ZhouYangL - I haven’t attempted to replicate your shell session, especially as it uses an image that looks prone to changing, and it’s been a few months. But I suspect your issue likes in the statement browser.page_source.encode('utf-8'), which is being called on what seems to be binary image content. I am guessing that you are using Python 2, because this kind of transform would be very unlikely to occur in Python 3? I suggest that the page contents are probably correct image binary as-is, and should not be encoded into a text-encoding such as UTF8.

Thanks everyone!

0reactions

ZhouYangLcommented, Nov 25, 2017

I got the @redapple , @mr-huangyang , @MaGuiSen , @dydjiangtao , @gasbakid please help me. I used scrapy shell url , it not question url='https://img.alicdn.com/bao/uploaded/i2/133560252265971710/TB2ZRMHsFXXXXXZXFXXXXXXXXXX_!!0-rate.jpg' from PIL import Image from cStringIO import StringIO as BytesIO
orig_image = Image.open(BytesIO(response.body))

but when I use selenuim and scrapy , I get the same question: from selenium.webdriver import Chrome from scrapy.http import HtmlResponse browser = Chrome() browser.get('https://img.alicdn.com/bao/uploaded/i2/133560252265971710/TB2ZRMHsFXXXXXZXFXXXXXXXXXX_!!0-rate.jpg') r=HtmlResponse('https://img.alicdn.com/bao/uploaded/i2/133560252265971710/TB2ZRMHsFXXXXXZXFXXXXXXXXXX_!!0-rate.jpg', body=browser.page_source.encode('utf-8'), encoding='utf-8') orig_image = Image.open(BytesIO(r.body))

then it have debugs orig_image = Image.open(BytesIO(r.body))

/usr/local/lib/python2.7/site-packages/PIL/Image.pyc in open(fp, mode) 2570 fp.close() 2571 raise IOError("cannot identify image file %r" -> 2572 % (filename if filename else fp)) 2573 2574 #

IOError: cannot identify image file <cStringIO.StringI object at 0x102d0ef10>