Scrapy incompatible with twisted 17.1.0 on TLS/SSL enabled websites.
See original GitHub issueReferencing #2479
Running pip install scrapy on a new virtualenv got me Twisted 17.1.0.
Quite a lot of HTTPS websites started to fail (e.g. google) with a twisted error SSL23_GET_SERVER_HELLO - tlsv1 alert internal error.
The main cause of the SSL23 error is that code introduced after #1794 on core/downloader/tls.py imports _maybeSetHostNameIndication which has been removed from _sslverify.py.
This code is also wrapped on a large try: ... except ImportError: pass which silences the actual cause of the problem.
Manually downgrading twisted to 16.0.0 solved the problem. Also to avoid these incompatibilities shouldn’t scrapy on pypi be limited to known working versions of dependencies?
Scrapy : 1.3.1
lxml : 3.7.2.0
libxml2 : 2.9.3
cssselect : 1.0.1
parsel : 1.1.0
w3lib : 1.17.0
Twisted : 17.1.0
Python : 3.4.5 (default, Jan 24 2017, 17:55:08) - [GCC 4.9.3]
pyOpenSSL : 16.2.0 (OpenSSL 1.0.2j 26 Sep 2016)
Platform : Linux-4.6.0-sabayon-x86_64-Intel-R-_Core-TM-_i7-4710MQ_CPU_@_2.50GHz-with-gentoo-2.2
Issue Analytics
- State:
- Created 7 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Release notes — Scrapy 2.7.1 documentation
Asyncio support is enabled by default on new projects ... And if you're using Twisted version 17.1.0 or above, FTP is now available...
Read more >Impossible to install twisted ( to get scrapy) - Stack Overflow
everything in the title I'm trying to install scrapy and I get this error when it comes to install twisted. I have no...
Read more >pyOpenSSL · PyPI
Backward-incompatible changes: Remove support for SSLv2 and SSLv3. The minimum cryptography version is now 37.0.2. The OpenSSL.crypto.
Read more >The Scrapy Playwright Guide - ScrapeOps
In this guide we show you how to use Scrapy Playwright to render and scrape Javascript heavy websites.
Read more >Crawling the Web with Python and Scrapy - Pluralsight
This Python Scrapy tutorial goes in depth specifically on scraping data from a website that has been paginated.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

Thanks @Unode for the detailed report and for the debugging, it was helpful!
@kmike Glad to help!
@redapple - Looking forward to hearing more about the use-case 😃.