Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

SSL errors crawling https sites using proxies

See original GitHub issue

I’m unable to scrape https sites through https supported proxies. I’ve tried with proxymesh as well as other proxy services. I can scrape most of this sites without proxies or using Tor.

Curl seems to work fine too: curl -x https://xx.xx.xx.xx:xx --proxy-user user:pass -L https://www.base.net:443 Retrieves the site’s html.

Setup:

OS: OS X El Capitan v10.11.3

Scrapy:

scrapy version -v
Scrapy    : 1.0.5
lxml      : 3.5.0.0
libxml2   : 2.9.2
Twisted   : 15.5.0
Python    : 2.7.11 (default, Dec  7 2015, 23:36:10) - [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)]
pyOpenSSL : 0.15.1 (OpenSSL 1.0.2g  1 Mar 2016)
Platform  : Darwin-15.3.0-x86_64-i386-64bit

Solutions tried: 1 - Installing Scrapy-1.1.0rc3 2016-03-09 12:44:59 [scrapy] ERROR: Error downloading <GET https://www.base.net/>: [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'SSL23_GET_SERVER_HELLO', 'unknown protocol')]>] Other website: 2016-03-09 12:56:45 [scrapy] DEBUG: Retrying <GET https://es.alojadogatopreto.com/es-es/> (failed 1 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl23_read', 'ssl handshake failure')]>]

2 - https://github.com/scrapy/scrapy/issues/1764#issuecomment-181950638 Using SSLv23_METHOD 2016-03-09 12:22:40 [scrapy] ERROR: Error downloading <GET https://www.base.net/>: [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'SSL23_GET_SERVER_HELLO', 'unknown protocol')]>] Using other SSL methods 2016-03-09 12:24:11 [scrapy] ERROR: Error downloading <GET https://www.base.net/>: [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'SSL3_GET_RECORD', 'wrong version number')]>]

3 - https://github.com/scrapy/scrapy/issues/1227#issuecomment-154890557 | Get same errors as in 1 & 2. 4 - https://github.com/scrapy/scrapy/issues/1429#issuecomment-131187012 | Get same errors as in 1 & 2.

Issue Analytics

State:
Created 8 years ago
Comments:7 (2 by maintainers)

Top GitHub Comments

12reactions

Cespedcommented, Mar 9, 2016

Thanks for answering @redapple.

The solution was changing base64.encodestring to base64.b64encode in my ProxyMiddleware. Did scrapy shell 'https://www.base.net' a few times and printed request.meta. The value for meta['proxy']changes each time and corresponds to those in my proxy list.

0reactions

Gallaeciocommented, Oct 30, 2020

You can alternatively use w3lib.http.basic_auth_header