question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Possibility of retrying requests by any response characteristic (using callback).

See original GitHub issue

The retry middleware allows to retry requests depending on the response status. However, some websites return a 200 code on error, so we may want to retry depending on a response header, or even the response body.

Instead of implementing this on a new middleware or on the spider itself I suggest a small backwards compatible change on the retry middleware. The user could define a list of callbacks on a settings variable, like:

RETRY_CALLBACKS = [
    "myproject.retry_callbacks.woocommerce_retry_callback",
    "myproject.retry_callbacks.google_retry_callback",
]

These callbacks would receive both the request and response object, so the developer can implement the retry conditions. In case that the callback returns True, the middleware would call self._retry.

This is something I have used in a couple of projects (inheriting from the retry middleware), so I’d be happy to add this feature to Scrapy.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:1
  • Comments:8 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
VMRuizcommented, May 11, 2018

I think adding this callbacks methods somehow breaks the clear structure of the Scrapy Core. Also, this functionality can be implemented in middlewares which can be configured with different priority values to set when should be run each retry_middleware instead of parsing all of them in same moment.

0reactions
dyeraycommented, Jun 8, 2018

Hi @grammy-jiang,

Only point 1 is true: My idea is that the retry condition can be customized by users, using their own defined functions (what I called callbacks, maybe not the best choice of words) that receive the response and return a boolean. This would allow to continue keeping the retry logic outside of the spider, and keep using all the retry parameters (number of retries, retry priority, etc.) of the base middleware.

Probably I will push a separate package that people can import if they need to use it instead of trying to merge into the base Scrapy.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Guzzle-Retry-Middleware - GitHub
This is a Guzzle v6/7+ middleware library that implements automatic retry of requests when HTTP servers respond with 503 or 429 status codes....
Read more >
How to Retry Requests Using Axios | JavaScript in Plain English
We will first learn about all the exceptions that may occur if Axios requests fail, then use the interceptor of Axios to retry...
Read more >
Callbacks | Crypto APIs - Technical Documentation
It is a security mechanism for retrying requests without the risk of performing the same operation more than once. Such risks usually can...
Read more >
Set up queued callback - Amazon Connect
To avoid duplicate callback requests in a callback queue, see this blog: Preventing duplicate ... A retry only happens if it rings but...
Read more >
Retrying event-driven functions - Google Cloud
If the failure is due to a bug or any other sort of permanent error, your function can get stuck in a retry...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found