question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[RFC] The Future of Locations

See original GitHub issue

The Future of Locations

This RFC suggests some changes to the concepts of “locations” within the service catalog, to enable a clearer model, add some features, and remove some old limitations.

The short version:

  • Remove the locations table
  • Make full use of the Location kind entities
  • Change the refresh loop such that entities are self-sufficient and update “themselves”, and Locations are mainly seen as a bootstrapping mechanism

Background

The Backstage software catalog has inherited the concept of “locations” from the old catalog used internally at Spotify. The basic idea is that users register such locations - for example a URL pointing to a yaml file in GitHub - that ultimately lead to raw entity data. Then, the catalog makes sure to keep itself up to date with changes in that location. This is the primary means of populating the catalog.

The way that the software catalog implemented these locations has become limiting, primarily:

  • They are a concept entirely of their own, in a separate table. This leads to duplication of efforts to get logs and statuses stored in the database.
  • There is a foreign key relationship between the entity and location tables that restricts or confuses the ability to add data in more flexible ways (through separate POST, through automated migrations, etc).
  • There exist both the location table, AND entities of kind Location, leading to confusion
  • There is some unclarity on how locations should map to the lifecycle of entities, in particular regarding deletions and moving from one location to another

This RFC proposes changes to the way that locations are used, both under the hood and as an end user of the APIs, to enable better use of this facility.

Scope

The following pieces of the catalog model are affected by this RFC:

  • The /locations section of the catalog API: This is where locations are added and removed.
  • The locations database table: When a location is added to the catalog, it ends up being written to this table.
  • The entities database table: Entities have a foreign key pointing to the location that spawned them.
  • *Entity annotations: The annotation backstage.io/managed-by-location is added to entities automatically, pointing to the actual location type/target that the entity was read from
  • The location refresh loop: There is a loop inside the catalog backend that periodically goes through all locations, reads them, and updates entities accordingly

Proposal

We suggest that locations as a separate table and concept are eliminated, and that we instead lean into using the Location entity kind. Users will register Location entities using direct POST requests to the entities API, and these will be used as sources for finding more entity data during the update loop.

This has a number of implications.

The entities table will have its location_id column dropped and the corresponding field removed from the TS types.

The locations table will be dropped and the corresponding TS types removed. In doing so, the migration step will have to translate each entry into a kind: Location entity in the entities table, with the corresponding data. This also means dropping the location_update_log table and the location_update_log_latest view.

The LocationsCatalog type and its implementation are removed and all of the corresponding routes are removed.

The addition of an entity via POST request needs to gain the ability to to a “bootstrap refresh run” if it happened to be a Location, like the old locations do upon creation, to verify that the location was valid and useful.

The frontend code needs to be updated to create an entity instead of using the separate API, when users register a location.

The mock-data scripts need to be updated similarly, as do some other parts of the frontend code (e.g. the button for mocking data into an empty catalog).

The frontend may want to assist the user when unregistering entities. If unregistering a Location, the user should be asked if they want to also unregister other entities that belong there (e.g. they have a backstage.io/managed-by-location annotation that points to the same type/target). If unregistering a non-Location, if there exists a Location object that points to the same type/target, the user should be informed about that because otherwise there’s the risk that the same entity almost instantly reappears as part of the refresh loop of that Location.

Finally, the refresh loop inside the catalog backend needs to be updated. It would look for all Location entities, and fetch the contents of all of them. The resulting entities will be given a backstage.io/managed-by-location annotation pointing to the same type/target as the Location that spawned them, and stored. Then, the loop will look at all the remaining entities that already had a backstage.io/managed-by-location but haven’t yet updated (meaning, they once had a Location but it was removed), and fetch updated contents for them too.

This way, entities will be self-sufficient in terms of refreshes. Locations will be regarded more as bootstrapping sources, that continuously spawn entities that can then be refreshed on their own.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:6
  • Comments:47 (44 by maintainers)

github_iconTop GitHub Comments

3reactions
frebencommented, Aug 10, 2021

It’s time to close this RFC! For the interested reader, here’s a summary of what the design ended up looking like.

The core of the catalog processing engine has seen a major rewrite, with the goal of being able to properly distribute work among multiple catalog service instances and opening up for future capabilities. As part of this rewrite, we decided to keep the locations table intact. However, on a code level, a new interface called EntityProvider was broken out from the previous monolith. There are two providers out of the box - the app-config based provider, and the location database based provider. They both serve as “seeds” of machine generated Location kind entities into the processing loops. After that, the regular catalog processors go to work on the emitted locations which ultimately results in a directed graph of processed entities that relate to each other.

This also opened up for a future possibility of making a third EntityProvider - one that has its own backing entity storage that users can POST/PUT/DELETE full entities into via a REST API, and which then is also “seeded” into the processing loops. Nothing else in the catalog core has to change for this. This provider is probably not something we’ll build ourselves but it can easily be contributed by the community.

The discussions also touched on event driven updates. That is another possibility which was largely enabled by the catalog rewrite, but the mechanics of what the interface for such updates should look like, is still under consideration.

Thanks for all of your input and valuable discussion on this RFC. ❤️ Let us know on Discord or in issues if there’s additional design or further needs that should be considered.

3reactions
Rugvipcommented, Aug 21, 2020

Yeah separate RFC might make sense, but tbh it’s very tied to this.

Bringing in some more descriptions to avoid confusion: First off a figure to illustrate the catalog processing loop:

Pipeline

Given that processing pipeline, we can view the catalog processing as a tree of entities that emit additional entities for processing. In this case the root of the processing is a static configuration of 3 locations, 2 of them for GitHub, and one for a location DB.

Full Refresh

Given the above full refresh of the catalog, we’d end up with a catalog structure looking like below, although the part of the catalog we’re mostly interested is the union of all leaf nodes. The graph is really also a DAG were we make sure each location is only processed once.

Catalog

Given this catalog, we can now handle location updates by re-processing the sub-tree of the graph underneath the updated location.

Subtree Refresh

Read more comments on GitHub >

github_iconTop Results From Across the Web

[RFC] The Future of Locations · Issue #2001 · backstage ...
The Future of Locations This RFC suggests some changes to the concepts of "locations" within the service catalog, to enable a clearer model, ......
Read more >
RFC 6280 - An Architecture for Location and Location Privacy ...
RFC 6280 Internet Location Architecture July 2011 information is transmitted and accessed in a secure and privacy- protective way is essential to the...
Read more >
RFC 3724: The Rise of the Middle and the Future of End-to-End
RFC 3724: The Rise of the Middle and the Future of End-to-End: Reflections on the Evolution of the Internet Architecture.
Read more >
RFC Future Pavilion
[1] Adapted from Instagram post by Landscape. First. Project location: Future Technology City, Wenyi West Road, Yuhang District, Hangzhou, China. Design year: ...
Read more >
2592-futures - The Rust RFC Book
Thus, the goal of this RFC is to stabilize this Future trait and the ... When a future is not ready yet, `poll`...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found