Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Scrapyd does not support spiders that use AsyncioSelectorReactor

See original GitHub issue

Currently scrapyd does not support spiders that use asyncio coroutines. When you upload the spider to scrapyd it fails with the following error. I didn’t see a way to override the twisted reactor implementation in scrapyd.

scrapyd_1        | 2020-05-16T13:56:19+0000 [_GenericHTTPChannelProtocol,0,172.18.0.1] Unhandled Error
scrapyd_1        | 	Traceback (most recent call last):
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/twisted/web/http.py", line 2284, in allContentReceived
scrapyd_1        | 	    req.requestReceived(command, path, version)
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/twisted/web/http.py", line 946, in requestReceived
scrapyd_1        | 	    self.process()
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/twisted/web/server.py", line 235, in process
scrapyd_1        | 	    self.render(resrc)
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/twisted/web/server.py", line 302, in render
scrapyd_1        | 	    body = resrc.render(self)
scrapyd_1        | 	--- <exception caught here> ---
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/scrapyd/webservice.py", line 21, in render
scrapyd_1        | 	    return JsonResource.render(self, txrequest).encode('utf-8')
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/scrapyd/utils.py", line 20, in render
scrapyd_1        | 	    r = resource.Resource.render(self, txrequest)
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/twisted/web/resource.py", line 265, in render
scrapyd_1        | 	    return m(request)
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/scrapyd/webservice.py", line 88, in render_POST
scrapyd_1        | 	    spiders = get_spider_list(project, version=version)
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/scrapyd/utils.py", line 134, in get_spider_list
scrapyd_1        | 	    raise RuntimeError(msg.encode('unicode_escape') if six.PY2 else msg)
scrapyd_1        | 	builtins.RuntimeError: /usr/local/lib/python3.6/dist-packages/scrapy/utils/project.py:94: ScrapyDeprecationWarning: Use of environment variables prefixed with SCRAPY_ to override settings is deprecated. The following environment variables are currently defined: EGG_VERSION
scrapyd_1        | 	  ScrapyDeprecationWarning
scrapyd_1        | 	Traceback (most recent call last):
scrapyd_1        | 	  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
scrapyd_1        | 	    "__main__", mod_spec)
scrapyd_1        | 	  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
scrapyd_1        | 	    exec(code, run_globals)
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/scrapyd/runner.py", line 40, in <module>
scrapyd_1        | 	    main()
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/scrapyd/runner.py", line 37, in main
scrapyd_1        | 	    execute()
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/scrapy/cmdline.py", line 144, in execute
scrapyd_1        | 	    cmd.crawler_process = CrawlerProcess(settings)
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/scrapy/crawler.py", line 265, in __init__
scrapyd_1        | 	    super(CrawlerProcess, self).__init__(settings)
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/scrapy/crawler.py", line 141, in __init__
scrapyd_1        | 	    self._handle_twisted_reactor()
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/scrapy/crawler.py", line 329, in _handle_twisted_reactor
scrapyd_1        | 	    super()._handle_twisted_reactor()
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/scrapy/crawler.py", line 237, in _handle_twisted_reactor
scrapyd_1        | 	    verify_installed_reactor(self.settings["TWISTED_REACTOR"])
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/scrapy/utils/reactor.py", line 77, in verify_installed_reactor
scrapyd_1        | 	    raise Exception(msg)
scrapyd_1        | 	Exception: The installed reactor (twisted.internet.epollreactor.EPollReactor) does not match the requested one (twisted.internet.asyncioreactor.AsyncioSelectorReactor)

The twisted application runner logs the default reactor for the platform when it starts up so I think that we would need a way to load a reactor before calling run. However I am not a twisted expert and that is just my guess having stepped through the code.

scrapyd_1        | 2020-05-16T13:56:10+0000 [twisted.scripts._twistd_unix.UnixAppLogger#info] twistd 20.3.0 (/usr/bin/python3 3.6.9) starting up.
scrapyd_1        | 2020-05-16T13:56:10+0000 [twisted.scripts._twistd_unix.UnixAppLogger#info] reactor class: twisted.internet.epollreactor.EPollReactor.

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:10

Top GitHub Comments

5reactions

inakrincommented, Nov 3, 2020

@sseveran I’ve just found one. In the scrapy code which is used by the scrapyd instance edit the file runner.py (for me the path was /opt/virtualenv/lib/python3.8/site-packages/scrapyd/runner.py) Just below all the existing import add this code: from scrapy.utils.reactor import install_reactor install_reactor('twisted.internet.asyncioreactor.AsyncioSelectorReactor') I’m not yet sure how the scrapers behave with that setting but so far I was able to deploy them, launch and scrape a few items

0reactions

u23acommented, Sep 28, 2021

@namiwa Thanks for your reply, I’ll try it.