Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

SSL issue when scraping website

See original GitHub issue

I have a spider that’s throwing the following error when trying to crawl this URL.

>>> fetch('https://vconnections.org/resources')
2015-08-12 10:07:28 [scrapy] INFO: Spider opened
2015-08-12 10:07:28 [scrapy] DEBUG: Retrying <GET https://vconnections.org/resources> (failed 1 times): [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
2015-08-12 10:07:33 [scrapy] DEBUG: Gave up retrying <GET https://vconnections.org/resources> (failed 2 times): [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/Users/gmeans/.virtualenvs/backlink/lib/python2.7/site-packages/scrapy/shell.py", line 87, in fetch
    reactor, self._schedule, request, spider)
  File "/Users/gmeans/.virtualenvs/backlink/lib/python2.7/site-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
    result.raiseException()
  File "<string>", line 2, in raiseException
ResponseNeverReceived: [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]

Other SSL urls work fine, and I tried implementing the solution from this previous issue:

https://github.com/scrapy/scrapy/issues/981

class CustomContextFactory(ScrapyClientContextFactory):
    def getContext(self, hostname=None, port=None):
        ctx = ClientContextFactory.getContext(self)
        # Enable all workarounds to SSL bugs as documented by
        # http://www.openssl.org/docs/ssl/SSL_CTX_set_options.html
        ctx.set_options(SSL.OP_ALL)
        if hostname:
            ClientTLSOptions(hostname, ctx)
        return ctx

Scrapy==1.0.3 Twisted==15.3.0 pyOpenSSL==0.15.1

OpenSSL 1.0.1k 8 Jan 2015

Any ideas on what else I could try? Thanks!

Issue Analytics

State:
Created 8 years ago
Comments:29 (9 by maintainers)

Top GitHub Comments

7reactions

gmeanscommented, Aug 17, 2015

Sure @wilsoncusack .

Context Factory, really simple in the end:

from OpenSSL import SSL
from scrapy.core.downloader.contextfactory import ScrapyClientContextFactory


class CustomContextFactory(ScrapyClientContextFactory):
    """
    Custom context factory that allows SSL negotiation.
    """

    def __init__(self):
        # Use SSLv23_METHOD so we can use protocol negotiation
        self.method = SSL.SSLv23_METHOD

Then make sure you update the settings.py:

DOWNLOADER_CLIENTCONTEXTFACTORY = 'spider.contexts.CustomContextFactory'

Yes I had to update OpenSSL via Homebrew for this to work. That’s because Apple has stopped using OpenSSL and switched to their own libraries.

No side effect I’ve seen, but I did this in a virtualenv.

4reactions

bpanattacommented, Dec 1, 2018

Sure @wilsoncusack . Context Factory, really simple in the end:
from OpenSSL import SSL
from scrapy.core.downloader.contextfactory import ScrapyClientContextFactory


class CustomContextFactory(ScrapyClientContextFactory):
    """
    Custom context factory that allows SSL negotiation.
    """

    def __init__(self):
        # Use SSLv23_METHOD so we can use protocol negotiation
        self.method = SSL.SSLv23_METHOD
Then make sure you update the settings.py:
DOWNLOADER_CLIENTCONTEXTFACTORY = 'spider.contexts.CustomContextFactory'
Yes I had to update OpenSSL via Homebrew for this to work. That’s because Apple has stopped using OpenSSL and switched to their own libraries. No side effect I’ve seen, but I did this in a virtualenv.
Hi I am new to Scrapy. Where have you stored this file? and with what name? Also is spider your bot-name?

Just set the DOWNLOADER_CLIENT_TLS_METHOD property to 'TLSv1.2' in the settings.py of your project. There is no more need for you to use the custom context factory to solve this problem.