question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Raising CloseSpider from DownloaderMiddleware doesn't work

See original GitHub issue

Behavior: Raising a CloseSpider exception from a DownloaderMiddleware’s process_response doesn’t close the spider. Instead, the scraper only outputs CloseSpider to stdout.

image

Expected behavior: Spider should shut down.

Workarounds:

self.crawler.stop()
return None

which requires to make self.crawler available in the DownloaderMiddleware via:

    def __init__(self, crawler):
        self.crawler = crawler

    @classmethod
    def from_crawler(cls, crawler):
        return cls(crawler)

Issue Analytics

  • State:open
  • Created 7 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

5reactions
kmikecommented, Feb 20, 2017

As documented, CloseSpider only works in spider callbacks, so this is not a bug:

This exception can be raised from a spider callback to request the spider to be closed/stopped.

But supporting CloseSpider in middlewares and extensions sounds reasonable.

Built-in scrapy.extensions.closespider.CloseSpider extension uses self.crawler.engine.close_spider(spider, 'reason') to stop crawling on some condition. But this API is poorly documented (if documented at all).

3reactions
kmikecommented, Aug 27, 2018

Hey @raphapassini! Currently the only place CloseSpider is caught is https://github.com/scrapy/scrapy/blob/e45ef7dcd987f00b94e94d71593f6b3664ceb89f/scrapy/core/scraper.py#L151

It seems it needs to be caught in middleware managers as well: spider middleware manager, downloader middleware manager, ExtensionManager - maybe even in the base MiddlewareManager class, I’m not sure.

If we implement this feature, we’d need to define how this works together with process_exception methods, i.e. do these methods get a chance to catch CloseSpider exception or not (I think it’d be nice to allow that, though I haven’t looked in detail).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Scrapy spider not terminating with use of CloseSpider extension
This works - if I set CLOSESPIDER_ITEMCOUNT to 10, it terminates ... This exception can be raised from a spider callback to request...
Read more >
elegant way to quit scrapy from middleware process_request ...
I know that from spider, I can terminate the process by raising CloseSpider() exception, but this does not work from middleware's ...
Read more >
Downloader Middleware — Scrapy 2.7.1 documentation
Once the newly returned request is performed, the appropriate middleware chain will be called on the downloaded response. If it raises an IgnoreRequest ......
Read more >
Exceptions - Scrapy documentation - Read the Docs
The exception that must be raised by item pipeline stages to stop ... This exception can be raised by the Scheduler or any...
Read more >
Downloader Middleware — Scrapy 1.0.1 documentation - Huihoo
If it raises an IgnoreRequest exception, the errback function of the request ( Request.errback ) is called. If no code handles the raised...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found