Thoughts on allowing service override at beforeSend
See original GitHub issueHey Folks, With the shipping of https://github.com/DataDog/browser-sdk/pull/1601 the RUM SDK usage got a ton better for maintaining good observability and reliability on federated environments 🙏 I very much appreciate the work there.
Regarding that, I do have a question/use-case I want to run by you folks.
The context
While I’ve seen the same situation in some other big projects, I mostly biased by company’s case, for a bit of context, we are using React, ATM one single codebase, SPA application, 200+ contributors and working towards a better federated story for it.
- For the context of this thread, a “feature” can be considered an isolated component, providing some type of user experience, owned by a single team.
browser-rum` is a great way gather real user data on a page-bases, and afterwards being able to analyze user’s behaviour in an application.
First case
Let’s say we have a specific page is composed of “2 features”, and those features are owned by 2 different teams.
If we want to track actions/errors/resources/etc but visualize them in the context of those features, things get a bit more complex. browser-rum
does expose a set of building blocks that allows us, with some data massaging in datadogRum.init’s beforeSend
hook, to enrich the context of events that were triggered/executed on specific features.
In our case, I’m dynamically creating react contexts, and sending component information down the pipeline, in order to attach some “feature context” to events in the
beforeSend
hook.
And while the solution “works” in the sense that we are able to visualize RUM events with contextual information attached, it does not allow for good datadog integrations.
Second Case
Another example is our “navigation header”. There’s a lot that goes on in ours, queries contexts, a/b tests, feature flags, tracking, etc. The header itself is shown in almost of our pages, so for example surfacing any error in it, will show up in multiple pages. And while we are using the same React-Context abstractions we created to inject contextual information, it makes it very hard for the responsible frontend engineers to properly leverage DataDog (specially compared to how much backend engineers can leverage it).
The final part 😅
So the situation is that more and more, specially for teams on a federated frontend architecture, is that the concept of “the url” is less useful for consolidating application information.
Lately got questions thrown my way like:
- Can we leverage services for showing our errors?
- How can I setup SLOs for my frontend features?
- My feature is used in 2 apps (think 2 different subdomains) how can I consolidate their information?
The actual questions
So, not to be the buzz killer with the PR enabling service version update. It’s awesome and we’ll look into leveraging right away :p
To the actual questions
- First of all… is our approach wrong? Should it be different/better?
- Is there a preferred way to map events to a specific service, that do not depend on a view?
- Are there suggestions/recommendations on how to structure reliability/observability for federated frontend applications?
- Like a “datadog seal of approval for X or Y approach”, or “Do’s and Dont’s of RUM + federation”
- Will you folks consider that? 🙏
- Have you folks considered framework-specific implementations that could make this easier for folks?
- and finally… Is your roadmap public? 😅 Curious on what you folks are planning on next
Issue Analytics
- State:
- Created a year ago
- Comments:6 (1 by maintainers)
Top GitHub Comments
I have a similar issue. And I tried to override the
service
in the beforeSend too. That’s what brought me here.That approach could work for us since based on the file where the error occurred we would be able to identify the “service”
Hey @BenoitZugmeyer
In our case, we have different “services” on the same page. We have a micro frontend structure. So a single page can have multiple codebases in there, with independent builds and deploys.
Trying to add a
service
incontext
did not work for me, I was not able to filter errors in the dashboards in that way.My current approach as a test is to actually capture the error event myself, execute
startView
with the correct service name and thenaddError
. That’s why I mentioned that being able to override the service when callingaddError
would be helpful until we get a better solution.If you want to understand better our use case we use https://single-spa.js.org/