Catalog update loop may anger GitHub's secondary rate limits
See original GitHub issueContext
When having a Backstage monorepo location we are seeing GitHub getting angry at us and shouting that secondary rate limits are being hit.
To reproduce import this catalog-info.yaml either by UI or adding as a location in config: https://github.com/xantier/big-monorepo-fixture/blob/main/catalog-info.yaml . It is a monorepo with 2000 catalog items within it.
You might need to run through the (very long) update process multiple times to hit this limit on a singular run. For easier reproduction open a few browser tabs and add this location at the same time from each of them.
We are seeing the following errors around these:
2021-11-24T13:16:16.719Z backstage error Error: Unable to read url, HttpError: You have exceeded a secondary rate limit. Please wait a few minutes before you try again. type=errorHandler stack=Error: Error: Unable to read url, HttpError: You have exceeded a secondary rate limit. Please wait a few minutes before you try again.
at DefaultLocationService.dryRunCreateLocation (/usr/src/app/node_modules/@backstage/plugin-catalog-backend/dist/index.cjs.js:3960:15)
at runMicrotasks (<anonymous>)
at processTicksAndRejections (internal/process/task_queues.js:95:5)
at async /usr/src/app/node_modules/@backstage/plugin-catalog-backend/dist/index.cjs.js:3080:22
Note that GitHub primary rate limits are not being hit during this, just secondary limits.
2021-11-24T13:16:49.108Z githubAppSupport info {"ratelimitData":{"resources":{"core":{"limit":5000,"used":3159,"remaining":1841,"reset":1637762834},"search":{"limit":30,"used":0,"remaining":30,"reset":1637759869},"graphql":{"limit":5000,"used":9,"remaining":4991,"reset":1637762897},"integration_manifest":{"limit":5000,"used":0,"remaining":5000,"reset":1637763409},"source_import":{"limit":100,"used":0,"remaining":100,"reset":1637759869},"code_scanning_upload":{"limit":500,"used":0,"remaining":500,"reset":1637763409},"actions_runner_registration":{"limit":10000,"used":0,"remaining":10000,"reset":1637763409},"scim":{"limit":15000,"used":0,"remaining":15000,"reset":1637763409}},"rate":{"limit":5000,"used":3159,"remaining":1841,"reset":1637762834}}} type=plugin
Why is this happening?
GitHub doesn’t seem to like if the same endpoint is called multiple times concurrently or in rapid succession. https://docs.github.com/en/rest/overview/resources-in-the-rest-api#secondary-rate-limits
This is something that seems to happen on monorepos where the Processor/GithubUrlReader loops through the targets within the main catalog-info.yaml file and calls the API once for each target to get more data from the same repository.
Feature Suggestion & Possible Implementation
Modify processors retrieving data to be aware of the location of the files they are getting. Add logic to retrieve needed files with a singular call to third party APIs. This can be done on few different levels if a hierarchical structure of targets is constructed across multiple yaml files.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:6
- Comments:17 (12 by maintainers)
Yep definitely! You’d need to stick to something a bit more strict most likely, perhaps plain
catalog-info.yaml
,*-info.yaml
and any other pattern 😁Should note ofc that this is another reason to move the webhooks work forward
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.