Scrapyd does not support spiders that use AsyncioSelectorReactor
See original GitHub issueCurrently scrapyd does not support spiders that use asyncio coroutines. When you upload the spider to scrapyd it fails with the following error. I didn’t see a way to override the twisted reactor implementation in scrapyd.
scrapyd_1 | 2020-05-16T13:56:19+0000 [_GenericHTTPChannelProtocol,0,172.18.0.1] Unhandled Error
scrapyd_1 | Traceback (most recent call last):
scrapyd_1 | File "/usr/local/lib/python3.6/dist-packages/twisted/web/http.py", line 2284, in allContentReceived
scrapyd_1 | req.requestReceived(command, path, version)
scrapyd_1 | File "/usr/local/lib/python3.6/dist-packages/twisted/web/http.py", line 946, in requestReceived
scrapyd_1 | self.process()
scrapyd_1 | File "/usr/local/lib/python3.6/dist-packages/twisted/web/server.py", line 235, in process
scrapyd_1 | self.render(resrc)
scrapyd_1 | File "/usr/local/lib/python3.6/dist-packages/twisted/web/server.py", line 302, in render
scrapyd_1 | body = resrc.render(self)
scrapyd_1 | --- <exception caught here> ---
scrapyd_1 | File "/usr/local/lib/python3.6/dist-packages/scrapyd/webservice.py", line 21, in render
scrapyd_1 | return JsonResource.render(self, txrequest).encode('utf-8')
scrapyd_1 | File "/usr/local/lib/python3.6/dist-packages/scrapyd/utils.py", line 20, in render
scrapyd_1 | r = resource.Resource.render(self, txrequest)
scrapyd_1 | File "/usr/local/lib/python3.6/dist-packages/twisted/web/resource.py", line 265, in render
scrapyd_1 | return m(request)
scrapyd_1 | File "/usr/local/lib/python3.6/dist-packages/scrapyd/webservice.py", line 88, in render_POST
scrapyd_1 | spiders = get_spider_list(project, version=version)
scrapyd_1 | File "/usr/local/lib/python3.6/dist-packages/scrapyd/utils.py", line 134, in get_spider_list
scrapyd_1 | raise RuntimeError(msg.encode('unicode_escape') if six.PY2 else msg)
scrapyd_1 | builtins.RuntimeError: /usr/local/lib/python3.6/dist-packages/scrapy/utils/project.py:94: ScrapyDeprecationWarning: Use of environment variables prefixed with SCRAPY_ to override settings is deprecated. The following environment variables are currently defined: EGG_VERSION
scrapyd_1 | ScrapyDeprecationWarning
scrapyd_1 | Traceback (most recent call last):
scrapyd_1 | File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
scrapyd_1 | "__main__", mod_spec)
scrapyd_1 | File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
scrapyd_1 | exec(code, run_globals)
scrapyd_1 | File "/usr/local/lib/python3.6/dist-packages/scrapyd/runner.py", line 40, in <module>
scrapyd_1 | main()
scrapyd_1 | File "/usr/local/lib/python3.6/dist-packages/scrapyd/runner.py", line 37, in main
scrapyd_1 | execute()
scrapyd_1 | File "/usr/local/lib/python3.6/dist-packages/scrapy/cmdline.py", line 144, in execute
scrapyd_1 | cmd.crawler_process = CrawlerProcess(settings)
scrapyd_1 | File "/usr/local/lib/python3.6/dist-packages/scrapy/crawler.py", line 265, in __init__
scrapyd_1 | super(CrawlerProcess, self).__init__(settings)
scrapyd_1 | File "/usr/local/lib/python3.6/dist-packages/scrapy/crawler.py", line 141, in __init__
scrapyd_1 | self._handle_twisted_reactor()
scrapyd_1 | File "/usr/local/lib/python3.6/dist-packages/scrapy/crawler.py", line 329, in _handle_twisted_reactor
scrapyd_1 | super()._handle_twisted_reactor()
scrapyd_1 | File "/usr/local/lib/python3.6/dist-packages/scrapy/crawler.py", line 237, in _handle_twisted_reactor
scrapyd_1 | verify_installed_reactor(self.settings["TWISTED_REACTOR"])
scrapyd_1 | File "/usr/local/lib/python3.6/dist-packages/scrapy/utils/reactor.py", line 77, in verify_installed_reactor
scrapyd_1 | raise Exception(msg)
scrapyd_1 | Exception: The installed reactor (twisted.internet.epollreactor.EPollReactor) does not match the requested one (twisted.internet.asyncioreactor.AsyncioSelectorReactor)
The twisted application runner logs the default reactor for the platform when it starts up so I think that we would need a way to load a reactor before calling run. However I am not a twisted expert and that is just my guess having stepped through the code.
scrapyd_1 | 2020-05-16T13:56:10+0000 [twisted.scripts._twistd_unix.UnixAppLogger#info] twistd 20.3.0 (/usr/bin/python3 3.6.9) starting up.
scrapyd_1 | 2020-05-16T13:56:10+0000 [twisted.scripts._twistd_unix.UnixAppLogger#info] reactor class: twisted.internet.epollreactor.EPollReactor.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:10
Top Results From Across the Web
python - Scrapy - ReactorAlreadyInstalledError when using ...
Using one spider was not a problem and worked great. However using two spiders result into the error: twisted.internet.error.
Read more >Settings — Scrapy 2.7.1 documentation
The Scrapy settings allows you to customize the behaviour of all Scrapy components, including the core, extensions, pipelines and spiders ...
Read more >Scrapyd Documentation
Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one.
Read more >Scrapy Documentation - Read the Docs
Installing Scrapy with PyPy on Windows is not tested. ... Spider arguments can also be passed through the Scrapyd schedule.json API.
Read more >scrapy - Bountysource
Job data in scrapyd is only kept in memory and thus removed when Scrapyd is ... Currently scrapyd does not support spiders that...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@sseveran I’ve just found one. In the scrapy code which is used by the scrapyd instance edit the file runner.py (for me the path was /opt/virtualenv/lib/python3.8/site-packages/scrapyd/runner.py) Just below all the existing import add this code:
from scrapy.utils.reactor import install_reactor install_reactor('twisted.internet.asyncioreactor.AsyncioSelectorReactor')
I’m not yet sure how the scrapers behave with that setting but so far I was able to deploy them, launch and scrape a few items@namiwa Thanks for your reply, I’ll try it.