Improve paramaterization of transaction names
See original GitHub issueDescribe the idea
We need to revisit how the JS SDK captures transactions (URL routes) and sends them to sentry.
Reasons for doing this and things we need to consider:
- parameterization: We should parameterize URLs whenever possible
Because
- grouping: transaction name influences transaction grouping; raw URL names lead to ungroupted transactions
- indexing and high cardinality: Raw URL transaction names lead to a high cardinality of transactions (because they’re not grouped)
- PII: Raw URL transaction names can contain sensitive data/PII (e.g. IDs, auth tokens, etc.)
- Dynamic Sampling: Propagating raw URLs in DSC vs. showing parameterized routes in the Sentry UI (and the DS settings) creates a lot of user confusion
Best effort has been decent so far, but the fallback to unparameterized or whole URL maybe sub-optimal.
Examples:
- Low-cardinality transaction name:
{"transaction": "/users/{username}", "transaction_source": "route"}
- Presumably a high-cardinality transaction name:
{"transaction": "/users/123235", "transaction_source": "uri"}
- User-defined transaction name, cardinality unknown:
{"transaction": "my_transaction_name", "transaction_source": "custom"}
Requirement:
what should be sent and where?
sentry-transaction
trace
Envelope Headertransaction
(string)- transaction envelope payload
- error envelope payload
- in the envelope header
Possible implementation
There’s a couple of things we can do or at least check:
Existing Routing Instrumentations with parameterization
As listed in #5345, we have a lot of popular routers covered with routing instrumentations. However, we might be able to improve paramenterizations in some of them. Hence, for each instrumentation
- check if there is a way to parameterize earlier
- try to match routes better (if it turns out that there are cases where our current matching fails)
- (send source information; tracked in #5345)
Existing Routing Instrumentations without parameterization
TODO: Check to which routers this applies
There are some routing instrumentations that don’t parameterize currently.
- add parameterization whenever possible
TBD: Approximative Parameterization
This has been discussed quite a bit in the past but given that we have to make our best effort for parameterization, let’s revisit this topic. The idea is seemingly simple: We could try to add a mechanism that takes a raw URL and tries to guess parts of that URL that might be parameters (e.g. IDs, tokens, etc). The mechanism would then replace these parts with a generic param placeholder.
Example:
/users/1235/credentials
==> users/:id/credentials
There are a lot of possible issues with this because obviously, there are going to be loads of edge cases, where this approximation might be off or miss parameters completely.
Why is this challenging?
- In the product, we have to explain why DSC might have another (or no) transaction name than what’s visible in the UI
- We have to explain why unparameterized routes are sent and even full URLs.
- Some frameworks and routers provide unique challenges
- What about custom routing instrumentations we don’t control?
Places to improve parameterization
- nextjs (@lobsterkatie ) https://github.com/getsentry/sentry-javascript/issues/5505
- express integration (#5450, @Lms24)
- angular (#5416, @Lms24)
- vue (I think Vue is pretty much optimal, @lforst)
- react router (pretty much optimal)
- react router v6 https://github.com/getsentry/sentry-javascript/pull/5515
- without framework (https://github.com/getsentry/sentry-javascript/pull/5411, @lforst)
- remix https://github.com/getsentry/sentry-javascript/pull/5491
- Ember
- Gatsby
Issue Analytics
- State:
- Created a year ago
- Comments:8 (5 by maintainers)
Top GitHub Comments
Hi @jamesbvaughan, would you mind opening a dedicated issue for this? Makes it easier to track this for us. cc @AbhiPrasad maybe you have a quick idea?
Always happy to refine/discuss this. Since this became a very important issue, we should sync with everyone here who has context on how transaction names/parameterization/routing instrumentation currently works