Raising CloseSpider from DownloaderMiddleware doesn't work
See original GitHub issueBehavior:
Raising a CloseSpider exception from a DownloaderMiddleware’s process_response
doesn’t close the spider.
Instead, the scraper only outputs CloseSpider
to stdout
.
Expected behavior: Spider should shut down.
Workarounds:
- Use
crawler.stop
:
self.crawler.stop()
return None
which requires to make self.crawler
available in the DownloaderMiddleware
via:
def __init__(self, crawler):
self.crawler = crawler
@classmethod
def from_crawler(cls, crawler):
return cls(crawler)
Use(doesn’t work for me)crawler._signal_shutdown()
Return(doesn’t work for me)None
fromprocess_response
Issue Analytics
- State:
- Created 7 years ago
- Comments:7 (5 by maintainers)
Top Results From Across the Web
Scrapy spider not terminating with use of CloseSpider extension
This works - if I set CLOSESPIDER_ITEMCOUNT to 10, it terminates ... This exception can be raised from a spider callback to request...
Read more >elegant way to quit scrapy from middleware process_request ...
I know that from spider, I can terminate the process by raising CloseSpider() exception, but this does not work from middleware's ...
Read more >Downloader Middleware — Scrapy 2.7.1 documentation
Once the newly returned request is performed, the appropriate middleware chain will be called on the downloaded response. If it raises an IgnoreRequest ......
Read more >Exceptions - Scrapy documentation - Read the Docs
The exception that must be raised by item pipeline stages to stop ... This exception can be raised by the Scheduler or any...
Read more >Downloader Middleware — Scrapy 1.0.1 documentation - Huihoo
If it raises an IgnoreRequest exception, the errback function of the request ( Request.errback ) is called. If no code handles the raised...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
As documented, CloseSpider only works in spider callbacks, so this is not a bug:
But supporting CloseSpider in middlewares and extensions sounds reasonable.
Built-in scrapy.extensions.closespider.CloseSpider extension uses
self.crawler.engine.close_spider(spider, 'reason')
to stop crawling on some condition. But this API is poorly documented (if documented at all).Hey @raphapassini! Currently the only place CloseSpider is caught is https://github.com/scrapy/scrapy/blob/e45ef7dcd987f00b94e94d71593f6b3664ceb89f/scrapy/core/scraper.py#L151
It seems it needs to be caught in middleware managers as well: spider middleware manager, downloader middleware manager, ExtensionManager - maybe even in the base MiddlewareManager class, I’m not sure.
If we implement this feature, we’d need to define how this works together with process_exception methods, i.e. do these methods get a chance to catch CloseSpider exception or not (I think it’d be nice to allow that, though I haven’t looked in detail).