question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Scrapyd does not support spiders that use AsyncioSelectorReactor

See original GitHub issue

Currently scrapyd does not support spiders that use asyncio coroutines. When you upload the spider to scrapyd it fails with the following error. I didn’t see a way to override the twisted reactor implementation in scrapyd.

scrapyd_1        | 2020-05-16T13:56:19+0000 [_GenericHTTPChannelProtocol,0,172.18.0.1] Unhandled Error
scrapyd_1        | 	Traceback (most recent call last):
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/twisted/web/http.py", line 2284, in allContentReceived
scrapyd_1        | 	    req.requestReceived(command, path, version)
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/twisted/web/http.py", line 946, in requestReceived
scrapyd_1        | 	    self.process()
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/twisted/web/server.py", line 235, in process
scrapyd_1        | 	    self.render(resrc)
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/twisted/web/server.py", line 302, in render
scrapyd_1        | 	    body = resrc.render(self)
scrapyd_1        | 	--- <exception caught here> ---
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/scrapyd/webservice.py", line 21, in render
scrapyd_1        | 	    return JsonResource.render(self, txrequest).encode('utf-8')
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/scrapyd/utils.py", line 20, in render
scrapyd_1        | 	    r = resource.Resource.render(self, txrequest)
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/twisted/web/resource.py", line 265, in render
scrapyd_1        | 	    return m(request)
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/scrapyd/webservice.py", line 88, in render_POST
scrapyd_1        | 	    spiders = get_spider_list(project, version=version)
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/scrapyd/utils.py", line 134, in get_spider_list
scrapyd_1        | 	    raise RuntimeError(msg.encode('unicode_escape') if six.PY2 else msg)
scrapyd_1        | 	builtins.RuntimeError: /usr/local/lib/python3.6/dist-packages/scrapy/utils/project.py:94: ScrapyDeprecationWarning: Use of environment variables prefixed with SCRAPY_ to override settings is deprecated. The following environment variables are currently defined: EGG_VERSION
scrapyd_1        | 	  ScrapyDeprecationWarning
scrapyd_1        | 	Traceback (most recent call last):
scrapyd_1        | 	  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
scrapyd_1        | 	    "__main__", mod_spec)
scrapyd_1        | 	  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
scrapyd_1        | 	    exec(code, run_globals)
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/scrapyd/runner.py", line 40, in <module>
scrapyd_1        | 	    main()
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/scrapyd/runner.py", line 37, in main
scrapyd_1        | 	    execute()
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/scrapy/cmdline.py", line 144, in execute
scrapyd_1        | 	    cmd.crawler_process = CrawlerProcess(settings)
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/scrapy/crawler.py", line 265, in __init__
scrapyd_1        | 	    super(CrawlerProcess, self).__init__(settings)
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/scrapy/crawler.py", line 141, in __init__
scrapyd_1        | 	    self._handle_twisted_reactor()
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/scrapy/crawler.py", line 329, in _handle_twisted_reactor
scrapyd_1        | 	    super()._handle_twisted_reactor()
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/scrapy/crawler.py", line 237, in _handle_twisted_reactor
scrapyd_1        | 	    verify_installed_reactor(self.settings["TWISTED_REACTOR"])
scrapyd_1        | 	  File "/usr/local/lib/python3.6/dist-packages/scrapy/utils/reactor.py", line 77, in verify_installed_reactor
scrapyd_1        | 	    raise Exception(msg)
scrapyd_1        | 	Exception: The installed reactor (twisted.internet.epollreactor.EPollReactor) does not match the requested one (twisted.internet.asyncioreactor.AsyncioSelectorReactor)

The twisted application runner logs the default reactor for the platform when it starts up so I think that we would need a way to load a reactor before calling run. However I am not a twisted expert and that is just my guess having stepped through the code.

scrapyd_1        | 2020-05-16T13:56:10+0000 [twisted.scripts._twistd_unix.UnixAppLogger#info] twistd 20.3.0 (/usr/bin/python3 3.6.9) starting up.
scrapyd_1        | 2020-05-16T13:56:10+0000 [twisted.scripts._twistd_unix.UnixAppLogger#info] reactor class: twisted.internet.epollreactor.EPollReactor.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:2
  • Comments:10

github_iconTop GitHub Comments

5reactions
inakrincommented, Nov 3, 2020

@sseveran I’ve just found one. In the scrapy code which is used by the scrapyd instance edit the file runner.py (for me the path was /opt/virtualenv/lib/python3.8/site-packages/scrapyd/runner.py) Just below all the existing import add this code: from scrapy.utils.reactor import install_reactor install_reactor('twisted.internet.asyncioreactor.AsyncioSelectorReactor') I’m not yet sure how the scrapers behave with that setting but so far I was able to deploy them, launch and scrape a few items

0reactions
u23acommented, Sep 28, 2021

@namiwa Thanks for your reply, I’ll try it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

python - Scrapy - ReactorAlreadyInstalledError when using ...
Using one spider was not a problem and worked great. However using two spiders result into the error: twisted.internet.error.
Read more >
Settings — Scrapy 2.7.1 documentation
The Scrapy settings allows you to customize the behaviour of all Scrapy components, including the core, extensions, pipelines and spiders ...
Read more >
Scrapyd Documentation
Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one.
Read more >
Scrapy Documentation - Read the Docs
Installing Scrapy with PyPy on Windows is not tested. ... Spider arguments can also be passed through the Scrapyd schedule.json API.
Read more >
scrapy - Bountysource
Job data in scrapyd is only kept in memory and thus removed when Scrapyd is ... Currently scrapyd does not support spiders that...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found