Incorrect currentTransaction context during parallel execution
See original GitHub issueDescribe the bug
During parallel execution of a code which is covered by custom transaction/span monitors, the currentTransaction
reference is misleading. For all parallel flows, it’s pointing to the firstly created transaction. I’m using a custom wrapper, which surrounds async/sync code and starts/ends transaction/span automatically - it’s deciding by existence of the currentTransaction
object, so that’s the place where issues start.
To Reproduce
Simplified wrapper function:
const monitorAsyncWrapper = (fx: Function, name: string) => {
const apmMonitor = apm.currentTransaction
? apm.startSpan(name)
: apm.startTransaction(name)
console.log(`creating new [${apmMonitor?.constructor.name}] - ${apmMonitor?.traceparent}`)
return fx.apply(null)
.then((result: any) => {
apmMonitor!.end()
console.log(`ending [${apmMonitor?.constructor.name}] - ${apmMonitor?.traceparent}`)
return result
})
}
Desired behavior: APM output for parallel execution should be same as for following serial execution:
(async () => {
for (let i = 0; i < 50; i++) {
await monitorAsyncWrapper(async () => {
monitorAsyncWrapper(async () => {
await setTimeout(() => {
return Math.pow(2, 2)
}, 100)
}, 'power calculation')
}, 'Calculations')
}
})();
console output showing transactions and spans are created correctly after each other
creating new [Transaction] - 00-e90ba39596ee8cd9173880a026e9d4bb-c2e9044e0caa4252-01
creating new [Span] - 00-e90ba39596ee8cd9173880a026e9d4bb-4c0e9b8bc4a05261-01
ending [Transaction] - 00-e90ba39596ee8cd9173880a026e9d4bb-c2e9044e0caa4252-01
ending [Span] - 00-e90ba39596ee8cd9173880a026e9d4bb-4c0e9b8bc4a05261-01
creating new [Transaction] - 00-e6f43777ee59a0f33b20b9a0c0970b99-f7305732fb1ede7d-01
creating new [Span] - 00-e6f43777ee59a0f33b20b9a0c0970b99-1c016033375384eb-01
ending [Transaction] - 00-e6f43777ee59a0f33b20b9a0c0970b99-f7305732fb1ede7d-01
ending [Span] - 00-e6f43777ee59a0f33b20b9a0c0970b99-1c016033375384eb-01
creating new [Transaction] - 00-7a16cba0e47ccc83e15e29b9a40244a6-3aab7f71c0d79ad9-01
creating new [Span] - 00-7a16cba0e47ccc83e15e29b9a40244a6-a0f4000911136a7a-01
ending [Transaction] - 00-7a16cba0e47ccc83e15e29b9a40244a6-3aab7f71c0d79ad9-01
Actual behavior: APM parallel execution:
(async () => {
let promises = []
for (let i = 0; i < 50; i++) {
promises.push(
monitorAsyncWrapper(async () => {
await monitorAsyncWrapper(async () => {
await setTimeout(() => {
return Math.pow(2, 2)
}, 100)
}, 'power calculation')
}, 'Calculations')
)
}
await Promise.all(promises)
})();
console output showing how all the promises are bind to the firstly created transaction:
creating new [Transaction] - 00-e48b38ebd2f1965c650261c49b7523ab-d3964644462375cd-01
creating new [Span] - 00-e48b38ebd2f1965c650261c49b7523ab-fe30d468e7ddb7c5-01
creating new [Span] - 00-e48b38ebd2f1965c650261c49b7523ab-3e199361bc1e777b-01
creating new [Span] - 00-e48b38ebd2f1965c650261c49b7523ab-ee64926800a91a5e-01
creating new [Span] - 00-e48b38ebd2f1965c650261c49b7523ab-d44e3b0cf8812f69-01
creating new [Span] - 00-e48b38ebd2f1965c650261c49b7523ab-d95df8233fae9c0f-01
creating new [Span] - 00-e48b38ebd2f1965c650261c49b7523ab-959358be41ac811d-01
creating new [Span] - 00-e48b38ebd2f1965c650261c49b7523ab-e2b9f28a92b22095-01
creating new [Span] - 00-e48b38ebd2f1965c650261c49b7523ab-337337dc5fdd847c-01
creating new [Span] - 00-e48b38ebd2f1965c650261c49b7523ab-8c71c46bbe8b5e9b-01
ending [Span] - 00-e48b38ebd2f1965c650261c49b7523ab-fe30d468e7ddb7c5-01
ending [Span] - 00-e48b38ebd2f1965c650261c49b7523ab-ee64926800a91a5e-01
Expected behavior
const apmMonitor = apm.currentTransaction
? apm.startSpan(name)
: apm.startTransaction(name)
the apm.currentTransaction
object should be null
for all promises
Environment (please complete the following information)
- OS: Linux
- Node.js version: tested on 16.2.0, 14.17.4
- APM Server version: 7.11.1
- Agent version: 3.19
How are you starting the agent? (please tick one of the boxes)
- Calling
agent.start()
directly (e.g.require('elastic-apm-node').start(...)
) - Requiring
elastic-apm-node/start
from within the source code - Starting node with
-r elastic-apm-node/start
Additional context
-
Agent config options
Click to expand
ELASTIC_APM_LOGGER=false
module.exports = { serverUrl: 'http://localhost:8200' }
-
package.json
dependencies:Click to expand
"dependencies": { "@elastic/ecs-pino-format": "^1.0.0", "elastic-apm-node": "^3.12.1" }
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
plays
i
in each transaction and span name is actually a bad practice because these values should have reasonably low cardinality. One side-effect of this is that a “metricset” is gathered for eachtransaction.name
and is being sent as well. So in addition to 800 transactions and 800 spans, the agent is attempting to send 800 metricsets. Reducing transaction.name cardinality and/or usingmetricsInterval=0
to disable metrics can help mitigate somewhat.ELASTIC_APM_MAX_QUEUE_SIZE
is relevant here now. This maxQueueSize is a defense mechanism intended to not overload the upstream APM server under very high tracing load from the user application. However in this case the burst of tracing load is fast enough (1600 tracing objects are ended at roughly the same time) that even the APM agent’s serialization and sending of events upstream does not keep up. I think it is fair that the agent limits impact on the user app by dropping events in this case.I’ve added https://github.com/elastic/apm-agent-nodejs/issues/2192 for the docs suggestion. Thanks!
I’m closing this issue now. Feel free to open new ones or ask on our discuss forum if you have other Qs or issues.
@david-sykora Thanks for the issue with the excellent detail and repro!
I made some changes to your script for discussion:
i
to the span/transaction names and printed those in the console.logs (it is easier to read names than the generated ids)async_hooks.executionAsyncId()
to each console.log to help show when code is run in the same task (on the node.js event loop)Running this as is
resulting in a trace something like this:
Notice that the creation of every
apmMonitor
inmonitorAsyncWrapper
is being run in the same event-loop task ([xid=1]
). (An “event-loop task” can be identified with node.js’async_hooks.executionAsyncId()
.) Actually, all the way down to thesetTimeout(...)
call is executed in that same event-loop task.The APM agent tracks a current transaction and current span per event-loop task. So the first call to
monitorAsyncWrapper
(“Calculations-0”) results inapm.startTransaction
and that becomes theapm.currentTransaction
for async task1
. All the subsequentapmMonitor
variables end up callingapm.startSpan()
.Forcing each
apmMonitor
into a separate async taskIf we uncomment the
await Promise.resolve()
line, this forces the rest of themonitorAsyncWrapper
s body into a separate async task.Notice how each “creating” is now in a separate async task id. The resulting trace is:
like you wanted.
Notes
This
await Promise.resolve()
is a bit of a hack. They are necessary workaround of the APM agent’s currentapm.startTransaction()
andapm.startSpan()
APIs which change the current context, instead of taking a function scope to run in a new context, something like:We are doing some work in this area, but there is no current timeline for a new API something like
withSpan(..., fn)
yet.