question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Disconnected HTTP2 sessions cause a crash when reporting transactions

See original GitHub issue

Describe the bug

We’re using the agent to instrument grpc calls. Since @grpc/grpc-js uses the internal Node HTTP/2 server, it’s able to auto-instrument all incoming requests, we’ve added additional instrumentation information ourselves.

We’ve found that very intermittently, some requests are causing process crashes. We’ve had to disable APM as a result.

Error thrown:

Error [ERR_HTTP2_SOCKET_UNBOUND]: The socket has been disconnected from the Http2Session
    at new NodeError (node:internal/errors:371:5)
    at Object.get (node:internal/http2/core:867:17)
    at Object.getContextFromRequest (/opt/app/node_modules/elastic-apm-node/lib/parsers.js:24:32)
    at Transaction.toJSON (/opt/app/node_modules/elastic-apm-node/lib/instrumentation/transaction.js:206:41)
    at Transaction._encode (/opt/app/node_modules/elastic-apm-node/lib/instrumentation/transaction.js:237:15)
    at Instrumentation.addEndedTransaction (/opt/app/node_modules/elastic-apm-node/lib/instrumentation/index.js:315:63)
    at Transaction.end (/opt/app/node_modules/elastic-apm-node/lib/instrumentation/transaction.js:288:32)
    at ServerHttp2Stream.<anonymous> (/opt/app/node_modules/elastic-apm-node/lib/instrumentation/modules/http2.js:80:17)
    at ServerHttp2Stream.f (/opt/app/node_modules/once/once.js:25:25)
    at ServerHttp2Stream.onend (/opt/app/node_modules/end-of-stream/index.js:36:27)
    at /opt/app/node_modules/elastic-apm-node/lib/instrumentation/run-context/AbstractRunContextManager.js:76:49
    at AsyncHooksRunContextManager.with (/opt/app/node_modules/elastic-apm-node/lib/instrumentation/run-context/BasicRunContextManager.js:49:17)
    at ServerHttp2Stream.wrapper (/opt/app/node_modules/elastic-apm-node/lib/instrumentation/run-context/AbstractRunContextManager.js:76:23)
    at ServerHttp2Stream.emit (node:events:538:35)
    at ServerHttp2Stream.emit (node:domain:475:12)
    at endReadableNT (node:internal/streams/readable:1345:12)
    at processTicksAndRejections (node:internal/process/task_queues:83:21)

I think a possible resolution here would be to surround this code in a try/catch: https://github.com/elastic/apm-agent-nodejs/blob/2f503a4011b63b4253bcfff8d9bb106c15c81c18/lib/parsers.js#L24-L28

To Reproduce This is difficult to reproduce, I’d assume you could do something like the following:

  • Start a HTTP/2 server
  • Send a request and close the connection before you completely receive a response

Expected behavior

The APM agent should not throw an exception.

Environment (please complete the following information)

  • OS: Linux (Debian)
  • Node.js version: 16.14.2
  • APM Server version: 7.17.0
  • Agent version: 3.31.0

How are you starting the agent? (please tick one of the boxes)

  • Calling agent.start() directly (e.g. require('elastic-apm-node').start(...))
  • ~Requiring elastic-apm-node/start from within the source code~
  • ~Starting node with -r elastic-apm-node/start~

Additional context

  • Agent config options

    Click to expand
        const apmAgent = apm.start({
            active: apmEnabled,
            secretToken,
            serverUrl: config.elasticApm.serverUrl,
            environment: config.elasticApm.environment,
            serviceName: '[REDACTED]',
            transactionSampleRate: 0.2,
            serviceVersion: '[REDACTED]',
            /** Ignore metrics & health transactions */
            transactionIgnoreUrls: '[REDACTED]',
            /** Use the Kubernetes pod name as the node name, if available */
            serviceNodeName: process.env.KUBERNETES_POD_NAME,
            logLevel: 'off',
        });
    
  • package.json dependencies

    Click to expand Only includes relevant packages
        "dependencies": {
            "@grpc/grpc-js": "1.6.3",
            "@grpc/proto-loader": "0.6.9",
            "@protobuf-ts/runtime": "2.4.0",
            "@protobuf-ts/runtime-rpc": "2.4.0",
            "elastic-apm-node": "3.31.0",
        },
    

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:12 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
trentmcommented, May 6, 2022

Thanks very much for reporting back.

looking into it.

I saw that slip. 😉

1reaction
trentmcommented, May 6, 2022

@curtdept That line, http2.js:238,

https://github.com/elastic/apm-agent-nodejs/blob/v3.33.0/lib/instrumentation/modules/http2.js#L238

is where the APM agent’s http2 instrumentation is (a) attempting to create a span for an outgoing http2.request() call, but (b) doesn’t create a span because the http2.request() call is not in the context of an APM transaction. On that line the APM agent is just calling the original function (i.e. the real http2.request(...)) after determining it isn’t going to capture any APM data.

I don’t think it indicates that the APM agent’s instrumentation is causing the issue. I don’t know that without more details, though. I don’t know how to start trying to reproduce.

It is probably best to open a separate issue, if you are willing, because this is a separate part of the instrumentation: wrapping outgoing HTTP/2 requests, rather than the wrapping of incoming HTTP/2 server requests.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Let It Crash: Best Practices for Handling Node.js Errors on ...
Some strategies to gracefully shutdown the Node.js process and quickly restart your application after a catastrophic error terminates your ...
Read more >
Troubleshoot common connection issues to Azure SQL ...
These connection problems can be caused by reconfiguration, firewall settings, a connection timeout, incorrect login information, or failure to ...
Read more >
Known issues - Fortinet Documentation Library
IPS engine crashes when IPS injects packets to vNP and vNP/DPDK fails to restart (crashes and sometimes is out of service). 755859. The...
Read more >
Crashes - Android Developers
An Android app crashes whenever there's an unexpected exit caused by an ... Once you have identified that your app is reporting crashes, ......
Read more >
MySQL bugs fixed by Aurora MySQL database engine updates
MySQL 8.0-compatible version Aurora contains all MySQL bug fixes through MySQL ... all errors reported during applying a transaction are correctly handled.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found