Large github-org location causes GHE API Rate Limit errors
See original GitHub issueExpected Behavior
Browsing the base Backstage catalog does not trigger GHE API Rate Limits, and my browser doesn’t show repetitive failures to load GHE Avatar images.
Current Behavior
Today I just added a location
to load up my github-org
off our GHE instance, which is a great feature. Earlier in the day I played around it a bit, viewing groups, seeing the users load up, works really well. But this evening, after only working a short while and just viewing 1-2 sample catalog items, I noticed in Chrome DevTools that I’ve triggered the GHE API Rate Limit abuse, with my Chrome DevTops seemingly showing me Avatar pictures being loaded in the background when only visting the main catalog page (not the Group page).
GET https://github.companyName.com/avatars/t/2222 429 (Too Many Requests)
My backend shows this which loading the only location I have defined every two minutes:
2020-12-16T02:21:24.056Z catalog info Read 826 entities from location bootstrap:bootstrap in 6.0s
2020-12-16T02:21:24.160Z catalog info Posting update success markers
2020-12-16T02:21:24.180Z catalog info Wrote 826 entities from location bootstrap:bootstrap in 124ms
I guess even with this, 5000 rate limit / (assuming) 826 (api hits) = 6 x every 2 minutes = 12 minutes until I reach my API rate limit, not to mention the avatars being loaded from my web browser.
So I only noticed this because of the Avatar errors, and I don’t know if they’re exacerbating the issue or if the root is just the frequency of loading the GHE org location.
Possible Solution
I’m going to take a guess that Avatar images are somehow being forward-loaded in some way, and potentially repetitively (wild guess) which may have triggered the API rate limit. In addition, it’s likely the catalog refresh is repeating frequently enough to help trigger this.
Suspicions:
- Reduce frequency of refreshing catalog
- Cache certain elements for a longer time frame
- Don’t forward-load Avatars unless they are used on a given page (not sure if that is happening, but it seems like it…?)
Steps to Reproduce
- Configure GHE (not GItHub.com)
- Load a GHE org via
app-config.yaml
locations: - type: github-org target: https://github.mycompany.com/my-favorite-org
- View the catalog homepage a few times, click around
- Turning on Chrome DevTools, notice your PAT has hit a GHE API Rate Limit
Context
- I have no customizations, so I’ll assume my GHE Personal Access Token which is used for the locations integration has the default of 5000 hits per hour. I’ve had this customization for weeks, and only after tying in my org have I triggered this limit. And this is for just one user (me!) on my personal laptop, not dozens or hundreds of users.
- The org I loaded has > 600 people and > 150 teams.
Your Environment
- NodeJS Version (v12): v12.16.2
- Operating System and Version (e.g. Ubuntu 14.04): Windows 10 1909 WSL2 Ubuntu 18.04
- Browser Information: Edge (Chrome engine 87)
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (4 by maintainers)
Top GitHub Comments
Funnily enough we mentioned this yesterday when doing planning for Q1 next year. Unfortunately, personal access tokens are not a great way to put this stuff into production and load production amounts of data into the catalog.
We’re looking at implementing Github Apps that you can install at Org level which would give us much higher limit.
As another workaround you could maybe try changing the poll interval, but of course this is not a stable solution either unfortunately.
How do we reduce the catalog refresh limit?