[RFC] Common pattern for plugins with multiple instances of external systems
See original GitHub issueStatus: Open for comments
Need
Plugins often deal with external systems and reference things inside them from annotations. e.g. jenkins.io/github-folder: 'folder-name/project-name'
or datadoghq.com/graph-token: <TOKEN>
Sometimes (especially for SaaS systems) there is a single instance and references are obvious (e.g. there is a single datadog.com and a graph token is a reference into that single namespace regardless of whose graph it is).
In other scenarios, there are multiple instances of an external system and each is its own namespace (i.e. a jenkins job called “my-service/build-job” is only meaningful in the context of a single jenkins instance).
There are many reasons for wanting multiple instances of an external system including poor organisation design (which backstage would be a good first step in fixing 😉 ), acquisitions or isolation between teams (Perhaps we deliberately provision a separate jenkins instance for each team).
Finally, I think this is a pattern which can be applied to a significant number of plugins and a consistent approach would be easier to understand for users.
Proposal
When defining a configuration schema for your backend plugin, allow for multiple named external systems (in this kafka example, a cluster is a named external system):
kafka:
clientId: backstage
clusters:
- name: cluster-name
brokers:
- localhost:9092
When defining the value of an annotation, include the name of the external system as a /
separated prefix e.g. for kafka, the annotation is:
kafka.apache.org/consumer-groups: cluster-name/consumer-group-name
multiple external systems with duplicated namespace
Some plugins are currently configured with a list of external systems and their annotation is expected to be present in all of them (i.e. they share a duplicated namespace), such as kubernetes:
kubernetes:
serviceLocatorMethod:
type: 'multiTenant'
clusterLocatorMethods:
- type: 'config'
clusters:
- url: http://127.0.0.1:9999
name: minikube
authProvider: 'serviceAccount'
skipTLSVerify: false
serviceAccountToken: ${K8S_MINIKUBE_TOKEN}
- url: http://127.0.0.2:9999
name: aws-cluster-1
authProvider: 'aws'
- type: 'gke'
projectId: 'gke-clusters'
region: 'europe-west1'
This config defines a number of clusters (2 for local testing, as many as can be found in the gke-clusters project on google) but if an entity was annotated with 'backstage.io/kubernetes-id': dice-roller
we would expect to find a dice-roller
pod in every one of those clusters (let’s say dev, test and prod).
If we now imagine 2 departments, each with their own kubernetes clusters, we should not expect to find the same dice-roller
pod in both department’s clusters and when we annotate our entity, we should instead use 'backstage.io/kubernetes-id': department-a/dice-roller
. For this setup, we would define the department’s cluseters in config as follows:
kubernetes:
serviceLocatorMethod:
type: 'multiTenant'
clusterGroups:
- name: department-a
clusterLocatorMethods:
- type: 'config'
clusters:
- url: http://127.0.0.1:9999
name: minikube
authProvider: 'serviceAccount'
skipTLSVerify: false
serviceAccountToken: ${K8S_MINIKUBE_TOKEN}
- url: http://127.0.0.2:9999
name: aws-cluster-1
authProvider: 'aws'
- type: 'gke'
projectId: 'gke-clusters'
region: 'europe-west1'
- name: department-b
clusterLocatorMethods:
- type: 'gke'
projectId: 'gke-clusters-deptB'
region: 'europe-west1'
I suspect this is what serviceLocatorMethod
is trying to solve but I’m not familiar with the plans for this field so comments from people who are would be particularly welcome.
multiple external systems with split namespace
I’m not sure if this is a requirement but I can image plugins which would want to list multiple external systems where some annotation values are present in one and some in the other and the backend doesn’t need to know which as it will try them in turn.
I suggest this is handled similarly to the kubernetes example above.
Alternatives
In addition, plugins could support a hook for returning the appropriate config (perhaps dynamically generated) for a given entity. This would default to reading the first part of the annotation value and looking it up in the config but could be replaced with something completely dynamic (search another system by entityRef or use {hostname: `${entity.spec.owner}.jenkins.example.com`}
)
Risks
Outstanding Questions
- Is this just for plugins which have a specific backend plugin and authenticate to the external system as backstage?
- Should we explicitly support a “default” named external system so we can avoid the prefix in the annotation in the simple case? Does this make parsing to ambiguous if the part after the (now optional)
/
could optionally contain a/
? If we don’t so this is backwards compatibility too hard/ugly to maintain - Do we need to standardise the name of the config property storing the list-of-named-external-systems (
clusters
in the kafka example above) or is this going too far?
Issue Analytics
- State:
- Created 2 years ago
- Reactions:3
- Comments:9 (8 by maintainers)
Thank you for writing this RFC, totally agree this is something we need more documented patterns and solutions for!
TL;DR: Plugins that require data that is owned by end users should not rely on the Backstage configuration as a source of that data.
I’d like to add a bit of background before diving into more concrete stuff.
This problem is of the common pattern that we need some piece of additional data in our plugin. The data could be just a single value for an entire Backstage deployment, or a couple of values, or maybe one value per entity, or a couple of values per entity. It could also be any of those things, but where we’re able to encode a procedure for deriving the data through some convention, like using the GitHub repo name as the component name, or using the entity name as the project name in an external system.
All of these problems tend to have a couple of different possible dimensions, but most importantly they also usually have a specific Backstage User Profile that is able to produce the data or decide the convention. Some examples here are for example that it is the software engineer (Backstage end user) that would typically decide the name and type of a component entity, while an integrator might decide the integration setup for Backstage with an external SCM system. What roles are involved in a particular problem and what data they supply is really important, because it puts some constraints on the solution if we want it to be a generic solution that can support most organizations.
When thinking about this I generally envision a large organization, think 1000+ engineers. The setup might be that 5 of those engineers are part of the Backstage integration team, who manage and configure the Backstage instance along with some core plugins. There might then be another 100 engineers that are plugin contributors, who built tooling that they surface through Backstage plugins. Then there’s the rest of the engineers, the Backstage end users, they’re by far the majority and typically want to get on with building out features and new systems.
Now let’s think about what pieces these different groups control. The integrators generally own the Backstage app and backend package(s), along with the configuration and maybe a couple of core plugins too. The integrators work in the internal Backstage repo and want to be able to do most their work through that repo. The contributors also work in the same repo or in packages that are brought in, but they want to be able to do most of their work through their plugin but occasionally contribute to the core setup of the Backstage instance. The software engineers use Backstage, but they manage their own projects with their catalog definition files, and don’t work in the internal Backstage repo at all.
With this you might begin to see that depending on who the owner of the source of the data is, we’re limited in the solutions we can use. For example, if every team has their own jenkins setup that needs to be configured, we don’t want to do this this either in the Backstage configuration or codebase, because software engineers should not need to care about that project. Now if it’s possible to establish a convention for how to find the jenkins builds given a repo name, then that is something that could be encoded in the Backstage repo either by the integrator team or contributors, but it should not be up to each software engineering team to encode their own conventions. We can also flip this around, for example each software engineering team should not configure their own GitHub integration, as that should instead be set up by the integrators, which is why we keep the GitHub setup in configuration instead of somewhere else.
Now there’s no golden rule here that solves everything, or works for everyone. For a small engineering organization it is probably fine to have everything centrally configured in the Backstage repo, which is why I like to solve these problems with the scope of large organizations, that is when we want to spend the time to find a proper solution that work for most people.
The point of this wall of text is really just to explain why when we have things that in some cases vary between different parts of an organization, we often do not want to have a solution where that is configured through the central Backstage configuration, because most teams should not have to interact with the central Backstage repo within an organization.
Will also acknowledge that this RFC isn’t really suggesting that we should do that, but want to have it in place for further discussion.
With that out of the way I’ll jump in with some actual ideas 😁
More powerful annotations
One thing I’ve been tempted by for a long time is to open up for annotations to not only contain string values, but any kind of JSON value. That would open up for less awkward serialization of things and clearer definitions, for example, instead of
What if we could have
There’s plenty of fun to be had with this migration though, so would have to be considered with care, but pretty tempted by this for many reasons 😁
Interface and code first
I think most of these solutions benefit from defining an interface for how to discover the external systems, which is then depended on by the plugin service, aka router. It can be as simple as this:
On top of that it’s pretty simple to provide a default implementation that simply reads all clusters from configuration, but already there you provide a bit of flexibility where you might be pulling in the list of clusters from an external system, or can perhaps deduce them from another list of integrations.
The Kubernetes plugin implements this pattern for the
KubernetesClustersSupplier
interface, with two different implementations: https://github.com/backstage/backstage/tree/d5bc75163fb91c442a31ec2ac4aba43c4570d8b8/plugins/kubernetes-backend/src/cluster-locatorWhere we can start getting a bit more powerful implementation is when we start providing some input to our
getClusters
method, for example we might slice out the cluster name from our annotation and pass it on to our method to filter the set of clusters:At this point we could also completely drop the configuration but still look up individual clusters. Perhaps there’s a simple domain name convention to use, such as
<clusterName>.kafka-clusters.acme.org
, or we could look them up through some other type of service discovery.The Kubernetes plugin implements this pattern for the
KubernetesServiceLocator
interface, with a single built-in implementation here: https://github.com/backstage/backstage/blob/d5bc75163fb91c442a31ec2ac4aba43c4570d8b8/plugins/kubernetes-backend/src/service-locator/MultiTenantServiceLocator.ts#L21Thought my favorite idea is to take this a step further and pass in entire entities and use them as a filter parameter:
We don’t pass the entire entity, just the ref (
<kind>:<namespace>/<name>
), and the backend then fetches the entire entity from the catalog, which might also end up being very important for security. Our method is still a regular list all endpoint, so if we have a plugin where we both want to have a global list page as well as individual entity pages, we can have that too. The entity we pass on simply acts as a powerful filter parameter, where the implementation is able to transform the entity definition into something that selects a subset of all our clusters.With this we should pretty much have complete flexibility in the implementation, we can have a dead-simple config only-one that always just returns a list of all clusters. We can have one that parses the standard annotation that’s used in the Backstage open source project. We can read a completely different internal annotation that might be reusable for several different resource that are tied to an entity, or we might just use the entity name itself and through some other convention look up the cluster.
It’s not really a full exploration, but I did try out a very lightweight version of this in the TODO plugin: https://github.com/backstage/backstage/blob/d5bc75163fb91c442a31ec2ac4aba43c4570d8b8/plugins/todo-backend/src/service/router.ts#L49