question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

feature request - address the 429 "too many requests"

See original GitHub issue

Sites like Wikipedia throttle the incoming requests, yielding 429 error:

Screenshot 2020-09-12 at 20 31 23

It may or may not be a broken link.

Idea: what if we improved the algorithm to slow down for 429 throttling domains and tackle all 429 links in a separate, second round, per-domain but slower?

Imagine, our outgoing requests go as normal, respecting --concurrency value, but when it completes, it extracts all 429 errors, groups them per throttling domain, waits a bit, then slowly tackles each link let’s say 1 query per second (or slower), concurrently, per-throttling domain.

Currently…

For example, I’ve got 1042 links and 11 of them linking to Wikipedia. If I set --concurrency to satisfy Wikipedia, let’s say 2 seconds per request, it will take 1042 × 2 / 60 = 34 minutes — unbearable, considering it’s for 1% of the links!

If we implemented the feature, it would be 1031 × 0.01 + 11*2 = 32 seconds. Reasonable, considering current 100 req/sec. throttle takes 1042 × 0.01 = 10 seconds.

I can exclude Wikipedia via --skip, but we can automate this, can’t we?

What do you think?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
reveltcommented, Mar 31, 2021

PS. github / npm / wikipedia link checks do work on the latest; I’m using linkinator linkinator ./dist --recurse --concurrency 1 — if anybody is still having 429 problems, limit the concurrency to one. Thank you Justin! 👍

1reaction
JustinBeckwithcommented, Sep 18, 2020

It’s a really good point - right now the crawler is a tad aggressive 😃 Another potential idea on how to handle this one - I suspect most services that return an HTTP 429 may also return a Retry-After header: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Retry-After

When that header is detected we could add the request to a queue that is specific to the subdomain, and then drain it in accordance with the retry guidance coming back from results. It sounds like a lot of fun to build 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Fix 429 Too Many Requests Error - Kinsta
The HTTP 429 error is returned when too many requests are made to a page within a short period of time. Find out...
Read more >
How to Fix 429 Too Many Requests Error Code - Hostinger
One of these HTTP codes is 429 Too Many Requests. As the name suggests, this code appears whenever someone repeatedly accesses a website...
Read more >
429 Error – Too Many Requests HTTP Code Explained
It means that the website can't handle the number of requests being sent to it. For a developer, this error can be hard...
Read more >
How to fix Error 429, Too many requests on Google Chrome
How to fix Error 429, Too many requests on Google Chrome · 1] Wait for some time and then try again · 2]...
Read more >
What an HTTP Error 429 Means & How to Fix It - HubSpot Blog
HTTP Error 429 is an HTTP response status code that indicates the client application has surpassed its rate limit, or number of requests...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found