question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

SSL issue when scraping website

See original GitHub issue

I have a spider that’s throwing the following error when trying to crawl this URL.

>>> fetch('https://vconnections.org/resources')
2015-08-12 10:07:28 [scrapy] INFO: Spider opened
2015-08-12 10:07:28 [scrapy] DEBUG: Retrying <GET https://vconnections.org/resources> (failed 1 times): [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
2015-08-12 10:07:33 [scrapy] DEBUG: Gave up retrying <GET https://vconnections.org/resources> (failed 2 times): [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/Users/gmeans/.virtualenvs/backlink/lib/python2.7/site-packages/scrapy/shell.py", line 87, in fetch
    reactor, self._schedule, request, spider)
  File "/Users/gmeans/.virtualenvs/backlink/lib/python2.7/site-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
    result.raiseException()
  File "<string>", line 2, in raiseException
ResponseNeverReceived: [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]

Other SSL urls work fine, and I tried implementing the solution from this previous issue:

https://github.com/scrapy/scrapy/issues/981

class CustomContextFactory(ScrapyClientContextFactory):
    def getContext(self, hostname=None, port=None):
        ctx = ClientContextFactory.getContext(self)
        # Enable all workarounds to SSL bugs as documented by
        # http://www.openssl.org/docs/ssl/SSL_CTX_set_options.html
        ctx.set_options(SSL.OP_ALL)
        if hostname:
            ClientTLSOptions(hostname, ctx)
        return ctx

Scrapy==1.0.3 Twisted==15.3.0 pyOpenSSL==0.15.1

OpenSSL 1.0.1k 8 Jan 2015

Any ideas on what else I could try? Thanks!

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:29 (9 by maintainers)

github_iconTop GitHub Comments

7reactions
gmeanscommented, Aug 17, 2015

Sure @wilsoncusack .

Context Factory, really simple in the end:

from OpenSSL import SSL
from scrapy.core.downloader.contextfactory import ScrapyClientContextFactory


class CustomContextFactory(ScrapyClientContextFactory):
    """
    Custom context factory that allows SSL negotiation.
    """

    def __init__(self):
        # Use SSLv23_METHOD so we can use protocol negotiation
        self.method = SSL.SSLv23_METHOD

Then make sure you update the settings.py:

DOWNLOADER_CLIENTCONTEXTFACTORY = 'spider.contexts.CustomContextFactory'

Yes I had to update OpenSSL via Homebrew for this to work. That’s because Apple has stopped using OpenSSL and switched to their own libraries.

No side effect I’ve seen, but I did this in a virtualenv.

4reactions
bpanattacommented, Dec 1, 2018

Sure @wilsoncusack . Context Factory, really simple in the end:

from OpenSSL import SSL
from scrapy.core.downloader.contextfactory import ScrapyClientContextFactory


class CustomContextFactory(ScrapyClientContextFactory):
    """
    Custom context factory that allows SSL negotiation.
    """

    def __init__(self):
        # Use SSLv23_METHOD so we can use protocol negotiation
        self.method = SSL.SSLv23_METHOD

Then make sure you update the settings.py:

DOWNLOADER_CLIENTCONTEXTFACTORY = 'spider.contexts.CustomContextFactory'

Yes I had to update OpenSSL via Homebrew for this to work. That’s because Apple has stopped using OpenSSL and switched to their own libraries. No side effect I’ve seen, but I did this in a virtualenv.

Hi I am new to Scrapy. Where have you stored this file? and with what name? Also is spider your bot-name?

Just set the DOWNLOADER_CLIENT_TLS_METHOD property to 'TLSv1.2' in the settings.py of your project. There is no more need for you to use the custom context factory to solve this problem.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Ask Question - Stack Overflow
This means that the server configuration is wrong and that not only python but several others will have problems with this site. Some...
Read more >
How to Resolve SSL/TSL Certificate in Python
Worth web scraping services explain in this tutorial about SSL/TSL Certificate, how it works and how to send Python verification request.
Read more >
[Python Scraping] SSL: CERTIFICATE_VERIFY_FAILED Error ...
I was practicing python scraping with urllib library to get data impossible to extract from request library , but was faced with following...
Read more >
Python Web Scraping: Verify SSL certificates for HTTPS ...
Python Web Scraping: Exercise-27 with Solution. Write a Python program to verify SSL certificates for HTTPS requests using requests module.
Read more >
Seeing SSL certificate error while trying to acces...
Python code for webpage scraping import. ... Seeing SSL certificate error while trying to access the Wiki page using Python web scrape.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found