[RFC] The Future of Locations
See original GitHub issueThe Future of Locations
This RFC suggests some changes to the concepts of “locations” within the service catalog, to enable a clearer model, add some features, and remove some old limitations.
The short version:
- Remove the
locations
table - Make full use of the
Location
kind entities - Change the refresh loop such that entities are self-sufficient and update “themselves”, and
Location
s are mainly seen as a bootstrapping mechanism
Background
The Backstage software catalog has inherited the concept of “locations” from the old catalog used internally at Spotify. The basic idea is that users register such locations - for example a URL pointing to a yaml file in GitHub - that ultimately lead to raw entity data. Then, the catalog makes sure to keep itself up to date with changes in that location. This is the primary means of populating the catalog.
The way that the software catalog implemented these locations has become limiting, primarily:
- They are a concept entirely of their own, in a separate table. This leads to duplication of efforts to get logs and statuses stored in the database.
- There is a foreign key relationship between the entity and location tables that restricts or confuses the ability to add data in more flexible ways (through separate POST, through automated migrations, etc).
- There exist both the location table, AND entities of kind
Location
, leading to confusion - There is some unclarity on how locations should map to the lifecycle of entities, in particular regarding deletions and moving from one location to another
This RFC proposes changes to the way that locations are used, both under the hood and as an end user of the APIs, to enable better use of this facility.
Scope
The following pieces of the catalog model are affected by this RFC:
- The
/locations
section of the catalog API: This is where locations are added and removed. - The
locations
database table: When a location is added to the catalog, it ends up being written to this table. - The
entities
database table: Entities have a foreign key pointing to the location that spawned them. - *Entity annotations: The annotation
backstage.io/managed-by-location
is added to entities automatically, pointing to the actual location type/target that the entity was read from - The location refresh loop: There is a loop inside the catalog backend that periodically goes through all locations, reads them, and updates entities accordingly
Proposal
We suggest that locations as a separate table and concept are eliminated, and that we instead lean into using the Location
entity kind. Users will register Location
entities using direct POST requests to the entities API, and these will be used as sources for finding more entity data during the update loop.
This has a number of implications.
The entities
table will have its location_id
column dropped and the corresponding field removed from the TS types.
The locations
table will be dropped and the corresponding TS types removed. In doing so, the migration step will have to translate each entry into a kind: Location
entity in the entities
table, with the corresponding data. This also means dropping the location_update_log
table and the location_update_log_latest
view.
The LocationsCatalog
type and its implementation are removed and all of the corresponding routes are removed.
The addition of an entity via POST request needs to gain the ability to to a “bootstrap refresh run” if it happened to be a Location, like the old locations do upon creation, to verify that the location was valid and useful.
The frontend code needs to be updated to create an entity instead of using the separate API, when users register a location.
The mock-data scripts need to be updated similarly, as do some other parts of the frontend code (e.g. the button for mocking data into an empty catalog).
The frontend may want to assist the user when unregistering entities. If unregistering a Location, the user should be asked if they want to also unregister other entities that belong there (e.g. they have a backstage.io/managed-by-location
annotation that points to the same type/target). If unregistering a non-Location, if there exists a Location object that points to the same type/target, the user should be informed about that because otherwise there’s the risk that the same entity almost instantly reappears as part of the refresh loop of that Location.
Finally, the refresh loop inside the catalog backend needs to be updated. It would look for all Location entities, and fetch the contents of all of them. The resulting entities will be given a backstage.io/managed-by-location
annotation pointing to the same type/target as the Location that spawned them, and stored. Then, the loop will look at all the remaining entities that already had a backstage.io/managed-by-location
but haven’t yet updated (meaning, they once had a Location but it was removed), and fetch updated contents for them too.
This way, entities will be self-sufficient in terms of refreshes. Locations will be regarded more as bootstrapping sources, that continuously spawn entities that can then be refreshed on their own.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:6
- Comments:47 (44 by maintainers)
Top GitHub Comments
It’s time to close this RFC! For the interested reader, here’s a summary of what the design ended up looking like.
The core of the catalog processing engine has seen a major rewrite, with the goal of being able to properly distribute work among multiple catalog service instances and opening up for future capabilities. As part of this rewrite, we decided to keep the
locations
table intact. However, on a code level, a new interface calledEntityProvider
was broken out from the previous monolith. There are two providers out of the box - the app-config based provider, and the location database based provider. They both serve as “seeds” of machine generatedLocation
kind entities into the processing loops. After that, the regular catalog processors go to work on the emitted locations which ultimately results in a directed graph of processed entities that relate to each other.This also opened up for a future possibility of making a third
EntityProvider
- one that has its own backing entity storage that users can POST/PUT/DELETE full entities into via a REST API, and which then is also “seeded” into the processing loops. Nothing else in the catalog core has to change for this. This provider is probably not something we’ll build ourselves but it can easily be contributed by the community.The discussions also touched on event driven updates. That is another possibility which was largely enabled by the catalog rewrite, but the mechanics of what the interface for such updates should look like, is still under consideration.
Thanks for all of your input and valuable discussion on this RFC. ❤️ Let us know on Discord or in issues if there’s additional design or further needs that should be considered.
Yeah separate RFC might make sense, but tbh it’s very tied to this.
Bringing in some more descriptions to avoid confusion: First off a figure to illustrate the catalog processing loop:
Given that processing pipeline, we can view the catalog processing as a tree of entities that emit additional entities for processing. In this case the root of the processing is a static configuration of 3 locations, 2 of them for GitHub, and one for a location DB.
Given the above full refresh of the catalog, we’d end up with a catalog structure looking like below, although the part of the catalog we’re mostly interested is the union of all leaf nodes. The graph is really also a DAG were we make sure each location is only processed once.
Given this catalog, we can now handle location updates by re-processing the sub-tree of the graph underneath the updated location.