Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Thoughts on allowing service override at beforeSend

See original GitHub issue

Hey Folks, With the shipping of https://github.com/DataDog/browser-sdk/pull/1601 the RUM SDK usage got a ton better for maintaining good observability and reliability on federated environments 🙏 I very much appreciate the work there.

Regarding that, I do have a question/use-case I want to run by you folks.

The context

While I’ve seen the same situation in some other big projects, I mostly biased by company’s case, for a bit of context, we are using React, ATM one single codebase, SPA application, 200+ contributors and working towards a better federated story for it.

For the context of this thread, a “feature” can be considered an isolated component, providing some type of user experience, owned by a single team.

browser-rum` is a great way gather real user data on a page-bases, and afterwards being able to analyze user’s behaviour in an application.

First case

Let’s say we have a specific page is composed of “2 features”, and those features are owned by 2 different teams.

If we want to track actions/errors/resources/etc but visualize them in the context of those features, things get a bit more complex. browser-rum does expose a set of building blocks that allows us, with some data massaging in datadogRum.init’s beforeSend hook, to enrich the context of events that were triggered/executed on specific features.

In our case, I’m dynamically creating react contexts, and sending component information down the pipeline, in order to attach some “feature context” to events in the beforeSend hook.

And while the solution “works” in the sense that we are able to visualize RUM events with contextual information attached, it does not allow for good datadog integrations.

Second Case

Another example is our “navigation header”. There’s a lot that goes on in ours, queries contexts, a/b tests, feature flags, tracking, etc. The header itself is shown in almost of our pages, so for example surfacing any error in it, will show up in multiple pages. And while we are using the same React-Context abstractions we created to inject contextual information, it makes it very hard for the responsible frontend engineers to properly leverage DataDog (specially compared to how much backend engineers can leverage it).

The final part 😅

So the situation is that more and more, specially for teams on a federated frontend architecture, is that the concept of “the url” is less useful for consolidating application information.

Lately got questions thrown my way like:

Can we leverage services for showing our errors?
How can I setup SLOs for my frontend features?
My feature is used in 2 apps (think 2 different subdomains) how can I consolidate their information?

The actual questions

So, not to be the buzz killer with the PR enabling service version update. It’s awesome and we’ll look into leveraging right away :p

To the actual questions

First of all… is our approach wrong? Should it be different/better?
Is there a preferred way to map events to a specific service, that do not depend on a view?
Are there suggestions/recommendations on how to structure reliability/observability for federated frontend applications?
- Like a “datadog seal of approval for X or Y approach”, or “Do’s and Dont’s of RUM + federation”
- Will you folks consider that? 🙏
Have you folks considered framework-specific implementations that could make this easier for folks?
and finally… Is your roadmap public? 😅 Curious on what you folks are planning on next

Issue Analytics

State:
Created a year ago
Comments:6 (1 by maintainers)

Top GitHub Comments

1reaction

felipetoffolo-toastcommented, Jul 28, 2022

I have a similar issue. And I tried to override the service in the beforeSend too. That’s what brought me here.

That approach could work for us since based on the file where the error occurred we would be able to identify the “service”

0reactions

felipetoffolo-toastcommented, Aug 3, 2022

Hey @BenoitZugmeyer

In our case, we have different “services” on the same page. We have a micro frontend structure. So a single page can have multiple codebases in there, with independent builds and deploys.

Trying to add a service in context did not work for me, I was not able to filter errors in the dashboards in that way.

My current approach as a test is to actually capture the error event myself, execute startView with the correct service name and then addError. That’s why I mentioned that being able to override the service when calling addError would be helpful until we get a better solution.

If you want to understand better our use case we use https://single-spa.js.org/