Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Raising CloseSpider from DownloaderMiddleware doesn't work

See original GitHub issue

Behavior: Raising a CloseSpider exception from a DownloaderMiddleware’s process_response doesn’t close the spider. Instead, the scraper only outputs CloseSpider to stdout.

Expected behavior: Spider should shut down.

Workarounds:

Use crawler.stop:

self.crawler.stop()
return None

which requires to make self.crawler available in the DownloaderMiddleware via:

    def __init__(self, crawler):
        self.crawler = crawler

    @classmethod
    def from_crawler(cls, crawler):
        return cls(crawler)

~~Use crawler._signal_shutdown()~~ (doesn’t work for me)
~~Return None from process_response~~ (doesn’t work for me)

Issue Analytics

State:
Created 7 years ago
Comments:7 (5 by maintainers)

Top GitHub Comments

5reactions

kmikecommented, Feb 20, 2017

As documented, CloseSpider only works in spider callbacks, so this is not a bug:

This exception can be raised from a spider callback to request the spider to be closed/stopped.

But supporting CloseSpider in middlewares and extensions sounds reasonable.

Built-in scrapy.extensions.closespider.CloseSpider extension uses self.crawler.engine.close_spider(spider, 'reason') to stop crawling on some condition. But this API is poorly documented (if documented at all).

3reactions

kmikecommented, Aug 27, 2018

Hey @raphapassini! Currently the only place CloseSpider is caught is https://github.com/scrapy/scrapy/blob/e45ef7dcd987f00b94e94d71593f6b3664ceb89f/scrapy/core/scraper.py#L151

It seems it needs to be caught in middleware managers as well: spider middleware manager, downloader middleware manager, ExtensionManager - maybe even in the base MiddlewareManager class, I’m not sure.

If we implement this feature, we’d need to define how this works together with process_exception methods, i.e. do these methods get a chance to catch CloseSpider exception or not (I think it’d be nice to allow that, though I haven’t looked in detail).

Top Results From Across the Web

Scrapy spider not terminating with use of CloseSpider extension

This works - if I set CLOSESPIDER_ITEMCOUNT to 10, it terminates ... This exception can be raised from a spider callback to request...

elegant way to quit scrapy from middleware process_request ...

I know that from spider, I can terminate the process by raising CloseSpider() exception, but this does not work from middleware's ...

Downloader Middleware — Scrapy 2.7.1 documentation

Once the newly returned request is performed, the appropriate middleware chain will be called on the downloaded response. If it raises an IgnoreRequest ......

Exceptions - Scrapy documentation - Read the Docs

The exception that must be raised by item pipeline stages to stop ... This exception can be raised by the Scheduler or any...

Downloader Middleware — Scrapy 1.0.1 documentation - Huihoo

If it raises an IgnoreRequest exception, the errback function of the request ( Request.errback ) is called. If no code handles the raised...