Adjust throttling for 429 response codes
See original GitHub issueHTTP 429 response code is returned when we reach a rate limit for an API at a given time. Usually, it is a matter of waiting a bit to start sending new requests. The “problem” is that, if concurrency settings are greater than the allowed number of requests from the API, we’ll always get 429s.
A solution would be to tune throttling so it delays requests based on 429s.
It can be a extension/middleware, as AutoThrottle
seems quite specific for throttle control over latency.
Also it could be worth considering that some APIs return the waiting time https://github.com/scrapy/scrapy/issues/3849
Here is a previous PR for this https://github.com/scrapy/scrapy/pull/3061
Issue Analytics
- State:
- Created 4 years ago
- Comments:14 (7 by maintainers)
Top Results From Across the Web
429 Too Many Requests - HTTP - MDN Web Docs
The HTTP 429 Too Many Requests response status code indicates the user has sent too many requests in a given amount of time...
Read more >What an HTTP Error 429 Means & How to Fix It - HubSpot Blog
Learn what the HTTP error 429 status code means, and how to resolve it to get your site up and running ... Set...
Read more >Implementing 429 retries and throttling for API rate-limits - Anvil
The first thing we need to nail down is how to handle the error responses when the API limits are exceeded. If you...
Read more >Handle throttling problems, or '429 - Azure Logic Apps
In Azure Logic Apps, your logic app returns an "HTTP 429 Too many requests" error when experiencing throttling, which happens when the ...
Read more >Troubleshoot API Gateway"429 Too Many Requests" or "Limit ...
Exceeding the throttling limit or quota returns a "429 Too Many Requests" or "Limit Exceeded" error response. For more information, see How ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@Gallaecio I think a per-domain basis would be more fitting for my use-case. However, with #5015 close to being approved (I’ve been eagerly following that discussion), I do believe that functionality will make it easier to account for status codes for throttling by subclassing the existing AutoThrottle middleware. Though, I feel this functionally would fall under a common dilemma, thus calling for an official middleware to address it.
From the specification:
So assuming that a 429 means “too many request to this domain”, it may mean “too many requests to this specific endpoint”, “too many requests with this cookie”, and so on. Maybe we could assume the domain scenario by default, since it is probably the most likely in web crawling, but ideally we should figure out a way to allow flexibility to deal with other scenarios effectively.