Possibility of retrying requests by any response characteristic (using callback).
See original GitHub issueThe retry middleware allows to retry requests depending on the response status. However, some websites return a 200 code on error, so we may want to retry depending on a response header, or even the response body.
Instead of implementing this on a new middleware or on the spider itself I suggest a small backwards compatible change on the retry middleware. The user could define a list of callbacks on a settings variable, like:
RETRY_CALLBACKS = [
"myproject.retry_callbacks.woocommerce_retry_callback",
"myproject.retry_callbacks.google_retry_callback",
]
These callbacks would receive both the request and response object, so the developer can implement the retry conditions. In case that the callback returns True, the middleware would call self._retry
.
This is something I have used in a couple of projects (inheriting from the retry middleware), so I’d be happy to add this feature to Scrapy.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:8 (2 by maintainers)
Top GitHub Comments
I think adding this callbacks methods somehow breaks the clear structure of the Scrapy Core. Also, this functionality can be implemented in middlewares which can be configured with different priority values to set when should be run each retry_middleware instead of parsing all of them in same moment.
Hi @grammy-jiang,
Only point 1 is true: My idea is that the retry condition can be customized by users, using their own defined functions (what I called callbacks, maybe not the best choice of words) that receive the response and return a boolean. This would allow to continue keeping the retry logic outside of the spider, and keep using all the retry parameters (number of retries, retry priority, etc.) of the base middleware.
Probably I will push a separate package that people can import if they need to use it instead of trying to merge into the base Scrapy.