process_spider_exception not called with exception from spider
See original GitHub issueAccording to the documentation, process_spider_exception
should also be called when a spider throws an exception. To my understanding, this would include throwing an exception from any parse method like this:
def parse_item(self, response):
log.msg("[parse_item] Now in exceptional parse", level=log.INFO)
raise Exception('foo')
My middleware looks like this:
class ManyExceptionsMiddleware(object):
def process_spider_output(self, response, result, spider):
log.msg("[process_spider_output] Shows that middleware IS installed", level=log.INFO)
return result
def process_spider_exception(self, response, exception, spider):
log.msg("[process_spider_exception] Many exceptions on %s" % spider.name, level=log.WARNING)
return []
This results in:
2015-01-18 18:08:01+0100 [example] DEBUG: Crawled (200) <GET some-secret-url> (referer: some-other-url)
2015-01-18 18:08:01+0100 [scrapy] INFO: [process_spider_output] Shows that middleware IS installed
2015-01-18 18:08:01+0100 [scrapy] INFO: [parse_item] Now in exceptional parse
2015-01-18 18:08:01+0100 [example] ERROR: Spider error processing <GET some-secret-url>
Traceback (most recent call last):
[...]
exceptions.Exception: foo
Then I added the following additional method to check that process_spider_exception
works (because the only exception handling in scrapy itself is done like this).
def process_spider_input(self, response, spider):
raise Exception('foo')
Then the output looks like this:
2015-01-18 18:09:53+0100 [example] DEBUG: Crawled (200) <GET some-secret-url> (referer: None)
2015-01-18 18:09:53+0100 [scrapy] WARNING: [process_spider_exception] Many exceptions on some-secret-domain
2015-01-18 18:09:53+0100 [scrapy] INFO: [process_spider_output] Shows that middleware IS installed
If you could tell me, where this all should happen, I could look into the code to fix it (if I understand it well enough).
Issue Analytics
- State:
- Created 9 years ago
- Reactions:6
- Comments:10 (3 by maintainers)
Top Results From Across the Web
Spider Middleware — Scrapy 2.7.1 documentation
If it raises an exception, Scrapy won't bother calling any other spider ... and no other process_spider_exception() will be called.
Read more >Python process spider exception - ProgramCreek.com
This page shows Python code examples for process spider exception.
Read more >How to add try exception in scrapy spider? - Stack Overflow
You can create a spider middleware and override the process_spider_exception() method, saving the links in a file there.
Read more >Exceptions - Manual - PHP
If an exception is thrown and its current function scope has no catch block, the exception will "bubble up" the call stack to...
Read more >Handling Exceptions on the CALL Statement - IBM
An exception condition occurs on a CALL statement when the CALL operation ... In this case, if you do not have an ON...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hello @ccc-larc, by adding those
yield
statements you are turning the parsing method into a generator, which makes the spider fall under the scope of #220. A fix was merged (#2061) but not yet released, it will be included in the next version. This is the output I get when running your code with the currentmaster
branch (c81d120b). Note that the item that was produced before the exception is processed normally.Closing it as fixed by https://github.com/scrapy/scrapy/pull/2061.