Improve Testability of Scrapy ( ReactorNotRestartable )
See original GitHub issueCurrent Situation
Using scrapy as described in the tutorials ( this (intro/tutorial.html) and this (topics/practices.html) ) might/will throw an
twisted.internet.error.ReactorNotRestartable
error if run with unittest
.
Both, using CrawlerProcess
or CrawlerRunner
will raise the error.
Also, there seems to be no (/ no easy) solution to this problem:
- scrapy-twisted-internet-error-reactornotrestartable-error-after-first-run
- reactornotrestartable-error-in-while-loop-with-scrapy
- scrapy-reactornotrestartable-one-class-to-run-two-or-more-spiders (solution only works, if scrapy is not wrapped in other domain code)
- scrapy-reactor-not-restartable (The accepted answer requires you to fork the process)
Suggestion
Make scrapy work in unittest environments without throwing the twisted.internet.error.ReactorNotRestartable
error.
Remarks
Might be related to https://github.com/scrapy/scrapy/issues/2594 but the intention is different
Example code
The code below will reproduce the error. It does not contain a unit test, but is essentially has the same behavior.
# ====================================================================
# define quotes spicer
import scrapy
from twisted.internet import reactor
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
class QuotesSpider(scrapy.Spider):
name = "quotes"
start_urls = [
'http://quotes.toscrape.com/page/1/',
'http://quotes.toscrape.com/page/2/',
]
def parse(self, response):
page = response.url.split("/")[-2]
filename = 'quotes-%s.html' % page
with open(filename, 'wb') as f:
f.write(response.body)
# ====================================================================
# Function to run spider
def run_spider():
configure_logging()
runner = CrawlerRunner({"LOG_LEVEL": 'ERROR'})
runner.crawl(QuotesSpider)
d = runner.join()
d.addBoth(lambda _: reactor.stop())
reactor.run() # the script will block here until all crawling jobs are finished
# ====================================================================
if "__main__" == __name__:
run_spider()
run_spider() # This call will fail
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (3 by maintainers)
Top Results From Across the Web
python - Scrapy - Reactor not Restartable
You cannot restart the reactor, but you should be able to run it more times by forking a separate process: import scrapy import ......
Read more >Scrapy: Twisted.Internet.Error.Reactornotrestartable From ...
I am able to run Scrapy in a Jupyter notebook. It looks like a problem of starting a reactor when one is already...
Read more >scrapy - twisted.internet.error.ReactorNotRestartable
I have a celery periodic task which calls a scraper to be executed via a command. Everytime it restarts I run into the…...
Read more >Spiders Contracts — Scrapy 2.7.1 documentation
Scrapy offers an integrated way of testing your spiders by the means of contracts. This allows you to test each callback of your...
Read more >Scrapy - Reactor not Restartable [duplicate] - DevPress - CSDN
加入社区. 登录. Python Scrapy - Reactor not Restartable [duplicate] ... line 684, in startRunning raise error.ReactorNotRestartable() ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Well, I guess this bug is about API/architecture perception and language perception. So probably a very subjective topic (that we have different opinions about).
I’ll sum up my points and close the bug. I guess everything else just spins around:
I would say the docs do not describe what you are describing here.
I would say the “assumes basic knowlege of the Twisted reactor” argument is questionable because
I would say “because the others also do it” should not be a heavy valid reason.
Thank you very much for the link, I will try to make use it.
I still don’t know what do you mean by that and you didn’t provide any unittest-related code. Note that Scrapy itself has an extensive test suite which uses it in a variety of modes.
And this code works as expected. On the other hand, the code you added to the first post is incorrect and goes against the Scrapy documentation: you have effectively reimplemented
CrawlerProcess
as yourrun_spider()
function and so it cannot run twice as well. You are supposed to start (and stop) the reactor only once. This is not specific to Scrapy, by the way.