Support limiting the number of requests per interval
See original GitHub issueMany Web site’s Open API limits the maximum number of requests in a certain interval from an IP address, Like 40 requests per minute.
How ever, the current arguments are CONCURRENT_REQUESTS
,CONCURRENT_REQUESTS_PER_DOMAIN
CONCURRENT_REQUESTS_PER_IP
, and DOWNLOAD_DELAY
. Which depend on the duration of completing requests, so I feel difficult to adjust according to the threshold in API.
To achieve high performance and don’t exceed the threshold of API, I suggests adding arguments like MAX_REQUESTS_PER_MINUTE
.
Thanks!
Issue Analytics
- State:
- Created 11 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
How to handle API rate limits: Do your integrations work at ...
An API rate limit might enforce, say, 100 requests per minute. Once requests exceed that number, it generates an error message to alert...
Read more >Rate limits | Docs | Twitter Developer Platform
The maximum number of requests that are allowed is based on a time interval, some specified period or window of time. The most...
Read more >Best Practices for API Rate Limits and Quotas with Moesif to ...
Both quotas and rate limits work by tracking the number of requests each API user makes within a defined time interval and then...
Read more >Everything You Need To Know About API Rate Limiting
This rate-limiting library automatically limits the number of requests that can be sent to an API. It also sets up the request queue ......
Read more >Rate-based rule statement - AWS Documentation
You set the limit as the number of requests per 5-minute time span. You can use this type of rule to put a...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
hi @jmaynier I was able to solve this using a combination of
CONCURRENT_REQUESTS_PER_DOMAIN
,DOWNLOAD_DELAY
anddownload_slot
.To summarise, scrapy uses the domain of a url as the “key” to create a
download_slot
(which are in charge of download concurrency), that’s why we can setCONCURRENT_REQUESTS_PER_DOMAIN
, because that way scrapy only controls the requests by slot.Now, you can create your own slots, to setup custom concurrency, and assign the
Request
objects to that slot for the requests that you want to control with that custom concurrency, the way to do it is to pass it on the meta parameter like this:Request(url, meta={'download_slot': 'mycustomslot'})
(if you want more requests to be controlled by the same slot, just keep passing that meta parameter).Now that you passed your custom slot, they will be still be controlled by the
CONCURRENT_REQUESTS_PER_DOMAIN
setting, even if that isn’t “actually” a domain.Now what I did to control concurrency per “credential” is just to emulate the “domain” or “slot” behaviour but per credential, which resulted on specifying the following settings:
settings.py
Now when doing requests with your credentials, specify a unique identifier per credential (you could set the credentials in a list and use the list index) into the
download_slot
meta parameter and keep passing it on all the requests you want to do with each credential, scrapy will take care of the concurrency of the requests per credential.NOTE: If you need to still change something in the request before scrapy really executes it (downloads it from the site), use a
DOWNLOADER MIDDLEWARE
, specifically in theprocess_request
method and change the request.@eLRuLL thanks for the detailed explanation !