question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

process_spider_exception not called with exception from spider

See original GitHub issue

According to the documentation, process_spider_exception should also be called when a spider throws an exception. To my understanding, this would include throwing an exception from any parse method like this:

    def parse_item(self, response):
        log.msg("[parse_item] Now in exceptional parse", level=log.INFO)
        raise Exception('foo')

My middleware looks like this:

class ManyExceptionsMiddleware(object):
    def process_spider_output(self, response, result, spider):
        log.msg("[process_spider_output] Shows that middleware IS installed", level=log.INFO)
        return result

    def process_spider_exception(self, response, exception, spider):
        log.msg("[process_spider_exception] Many exceptions on %s" % spider.name, level=log.WARNING)
        return []

This results in:

2015-01-18 18:08:01+0100 [example] DEBUG: Crawled (200) <GET some-secret-url> (referer: some-other-url)
2015-01-18 18:08:01+0100 [scrapy] INFO: [process_spider_output] Shows that middleware IS installed
2015-01-18 18:08:01+0100 [scrapy] INFO: [parse_item] Now in exceptional parse
2015-01-18 18:08:01+0100 [example] ERROR: Spider error processing <GET some-secret-url>
    Traceback (most recent call last):
[...]
    exceptions.Exception: foo

Then I added the following additional method to check that process_spider_exception works (because the only exception handling in scrapy itself is done like this).

def process_spider_input(self, response, spider):
    raise Exception('foo')

Then the output looks like this:

2015-01-18 18:09:53+0100 [example] DEBUG: Crawled (200) <GET some-secret-url> (referer: None)
2015-01-18 18:09:53+0100 [scrapy] WARNING: [process_spider_exception] Many exceptions on some-secret-domain
2015-01-18 18:09:53+0100 [scrapy] INFO: [process_spider_output] Shows that middleware IS installed

If you could tell me, where this all should happen, I could look into the code to fix it (if I understand it well enough).

Issue Analytics

  • State:closed
  • Created 9 years ago
  • Reactions:6
  • Comments:10 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
elacuestacommented, Jun 26, 2019

Hello @ccc-larc, by adding those yield statements you are turning the parsing method into a generator, which makes the spider fall under the scope of #220. A fix was merged (#2061) but not yet released, it will be included in the next version. This is the output I get when running your code with the current master branch (c81d120b). Note that the item that was produced before the exception is processed normally.

2019-06-26 10:48:49 [scrapy.core.engine] INFO: Spider opened
2019-06-26 10:48:49 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-06-26 10:48:49 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-06-26 10:48:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://example.org> (referer: None)
2019-06-26 10:48:49 [root] INFO: [process_spider_output] Shows that middleware is installed
2019-06-26 10:48:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://example.org>
{'an': 'item'}
2019-06-26 10:48:49 [root] INFO: [parse] Now in exceptional parse
2019-06-26 10:48:49 [root] WARNING: [process_spider_exception] Exception caught: from parse
2019-06-26 10:48:49 [scrapy.core.engine] INFO: Closing spider (finished)
0reactions
kmikecommented, Jul 4, 2019
Read more comments on GitHub >

github_iconTop Results From Across the Web

Spider Middleware — Scrapy 2.7.1 documentation
If it raises an exception, Scrapy won't bother calling any other spider ... and no other process_spider_exception() will be called.
Read more >
Python process spider exception - ProgramCreek.com
This page shows Python code examples for process spider exception.
Read more >
How to add try exception in scrapy spider? - Stack Overflow
You can create a spider middleware and override the process_spider_exception() method, saving the links in a file there.
Read more >
Exceptions - Manual - PHP
If an exception is thrown and its current function scope has no catch block, the exception will "bubble up" the call stack to...
Read more >
Handling Exceptions on the CALL Statement - IBM
An exception condition occurs on a CALL statement when the CALL operation ... In this case, if you do not have an ON...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found