Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Possibility of retrying requests by any response characteristic (using callback).

See original GitHub issue

The retry middleware allows to retry requests depending on the response status. However, some websites return a 200 code on error, so we may want to retry depending on a response header, or even the response body.

Instead of implementing this on a new middleware or on the spider itself I suggest a small backwards compatible change on the retry middleware. The user could define a list of callbacks on a settings variable, like:

RETRY_CALLBACKS = [
    "myproject.retry_callbacks.woocommerce_retry_callback",
    "myproject.retry_callbacks.google_retry_callback",
]

These callbacks would receive both the request and response object, so the developer can implement the retry conditions. In case that the callback returns True, the middleware would call self._retry.

This is something I have used in a couple of projects (inheriting from the retry middleware), so I’d be happy to add this feature to Scrapy.

Issue Analytics

State:
Created 5 years ago
Reactions:1
Comments:8 (2 by maintainers)

Top GitHub Comments

1reaction

VMRuizcommented, May 11, 2018

I think adding this callbacks methods somehow breaks the clear structure of the Scrapy Core. Also, this functionality can be implemented in middlewares which can be configured with different priority values to set when should be run each retry_middleware instead of parsing all of them in same moment.

0reactions

dyeraycommented, Jun 8, 2018

Hi @grammy-jiang,

Only point 1 is true: My idea is that the retry condition can be customized by users, using their own defined functions (what I called callbacks, maybe not the best choice of words) that receive the response and return a boolean. This would allow to continue keeping the retry logic outside of the spider, and keep using all the retry parameters (number of retries, retry priority, etc.) of the base middleware.

Probably I will push a separate package that people can import if they need to use it instead of trying to merge into the base Scrapy.

Top Results From Across the Web

Guzzle-Retry-Middleware - GitHub

This is a Guzzle v6/7+ middleware library that implements automatic retry of requests when HTTP servers respond with 503 or 429 status codes....

How to Retry Requests Using Axios | JavaScript in Plain English

We will first learn about all the exceptions that may occur if Axios requests fail, then use the interceptor of Axios to retry...

Callbacks | Crypto APIs - Technical Documentation

It is a security mechanism for retrying requests without the risk of performing the same operation more than once. Such risks usually can...

Set up queued callback - Amazon Connect

To avoid duplicate callback requests in a callback queue, see this blog: Preventing duplicate ... A retry only happens if it rings but...

Retrying event-driven functions - Google Cloud

If the failure is due to a bug or any other sort of permanent error, your function can get stuck in a retry...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Possibility of retrying requests by any response characteristic (using callback).

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Scrapy "session" extension

Settings from command line not working?