Disconnected HTTP2 sessions cause a crash when reporting transactions
See original GitHub issueDescribe the bug
We’re using the agent to instrument grpc calls. Since @grpc/grpc-js
uses the internal Node HTTP/2 server, it’s able to auto-instrument all incoming requests, we’ve added additional instrumentation information ourselves.
We’ve found that very intermittently, some requests are causing process crashes. We’ve had to disable APM as a result.
Error thrown:
Error [ERR_HTTP2_SOCKET_UNBOUND]: The socket has been disconnected from the Http2Session
at new NodeError (node:internal/errors:371:5)
at Object.get (node:internal/http2/core:867:17)
at Object.getContextFromRequest (/opt/app/node_modules/elastic-apm-node/lib/parsers.js:24:32)
at Transaction.toJSON (/opt/app/node_modules/elastic-apm-node/lib/instrumentation/transaction.js:206:41)
at Transaction._encode (/opt/app/node_modules/elastic-apm-node/lib/instrumentation/transaction.js:237:15)
at Instrumentation.addEndedTransaction (/opt/app/node_modules/elastic-apm-node/lib/instrumentation/index.js:315:63)
at Transaction.end (/opt/app/node_modules/elastic-apm-node/lib/instrumentation/transaction.js:288:32)
at ServerHttp2Stream.<anonymous> (/opt/app/node_modules/elastic-apm-node/lib/instrumentation/modules/http2.js:80:17)
at ServerHttp2Stream.f (/opt/app/node_modules/once/once.js:25:25)
at ServerHttp2Stream.onend (/opt/app/node_modules/end-of-stream/index.js:36:27)
at /opt/app/node_modules/elastic-apm-node/lib/instrumentation/run-context/AbstractRunContextManager.js:76:49
at AsyncHooksRunContextManager.with (/opt/app/node_modules/elastic-apm-node/lib/instrumentation/run-context/BasicRunContextManager.js:49:17)
at ServerHttp2Stream.wrapper (/opt/app/node_modules/elastic-apm-node/lib/instrumentation/run-context/AbstractRunContextManager.js:76:23)
at ServerHttp2Stream.emit (node:events:538:35)
at ServerHttp2Stream.emit (node:domain:475:12)
at endReadableNT (node:internal/streams/readable:1345:12)
at processTicksAndRejections (node:internal/process/task_queues:83:21)
I think a possible resolution here would be to surround this code in a try/catch
: https://github.com/elastic/apm-agent-nodejs/blob/2f503a4011b63b4253bcfff8d9bb106c15c81c18/lib/parsers.js#L24-L28
To Reproduce This is difficult to reproduce, I’d assume you could do something like the following:
- Start a HTTP/2 server
- Send a request and close the connection before you completely receive a response
Expected behavior
The APM agent should not throw an exception.
Environment (please complete the following information)
- OS: Linux (Debian)
- Node.js version: 16.14.2
- APM Server version: 7.17.0
- Agent version: 3.31.0
How are you starting the agent? (please tick one of the boxes)
- Calling
agent.start()
directly (e.g.require('elastic-apm-node').start(...)
) - ~Requiring
elastic-apm-node/start
from within the source code~ - ~Starting node with
-r elastic-apm-node/start
~
Additional context
-
Agent config options
Click to expand
const apmAgent = apm.start({ active: apmEnabled, secretToken, serverUrl: config.elasticApm.serverUrl, environment: config.elasticApm.environment, serviceName: '[REDACTED]', transactionSampleRate: 0.2, serviceVersion: '[REDACTED]', /** Ignore metrics & health transactions */ transactionIgnoreUrls: '[REDACTED]', /** Use the Kubernetes pod name as the node name, if available */ serviceNodeName: process.env.KUBERNETES_POD_NAME, logLevel: 'off', });
-
package.json
dependenciesClick to expand
Only includes relevant packages"dependencies": { "@grpc/grpc-js": "1.6.3", "@grpc/proto-loader": "0.6.9", "@protobuf-ts/runtime": "2.4.0", "@protobuf-ts/runtime-rpc": "2.4.0", "elastic-apm-node": "3.31.0", },
Issue Analytics
- State:
- Created a year ago
- Comments:12 (6 by maintainers)
Top GitHub Comments
Thanks very much for reporting back.
I saw that slip. 😉
@curtdept That line,
http2.js:238
,https://github.com/elastic/apm-agent-nodejs/blob/v3.33.0/lib/instrumentation/modules/http2.js#L238
is where the APM agent’s http2 instrumentation is (a) attempting to create a span for an outgoing
http2.request()
call, but (b) doesn’t create a span because thehttp2.request()
call is not in the context of an APM transaction. On that line the APM agent is just calling the original function (i.e. the realhttp2.request(...)
) after determining it isn’t going to capture any APM data.I don’t think it indicates that the APM agent’s instrumentation is causing the issue. I don’t know that without more details, though. I don’t know how to start trying to reproduce.
It is probably best to open a separate issue, if you are willing, because this is a separate part of the instrumentation: wrapping outgoing HTTP/2 requests, rather than the wrapping of incoming HTTP/2 server requests.