question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

HTTP/2 stack creates excessive latency and throughput overhead

See original GitHub issue

Issue Type: Bug report

Linkerd introduces a significant amount of overhead in terms of both throughput and latency for gRPC services versus two processes communicating directly.

Here is a max qps strest-grpc run for each scenario:

Direct

> ./strest-grpc server --address "127.0.0.1:9999"
> ./strest-grpc client --address "127.0.0.1:9999" --totalRequests 100000 --streams 100 
{
  "good": 100000,
  "bad": 0,
  "bytes": 0,
  "latency": {
    "p50": 3,
    "p75": 4,
    "p90": 5,
    "p95": 5,
    "p99": 8,
    "p999": 19
  },
  "jitter": {
    "p50": 0,
    "p75": 0,
    "p90": 0,
    "p95": 0,
    "p99": 0,
    "p999": 0
  }
}

Via Linkerd

> ./strest-grpc client --address "127.0.0.1:4143" --totalRequests 100000 --streams 100 
{
  "good": 100000,
  "bad": 0,
  "bytes": 0,
  "latency": {
    "p50": 46,
    "p75": 53,
    "p90": 76,
    "p95": 122,
    "p99": 229,
    "p999": 412
  },
  "jitter": {
    "p50": 0,
    "p75": 0,
    "p90": 0,
    "p95": 0,
    "p99": 0,
    "p999": 0
  }
}

Configuration:

admin:
  port: 9990
  ip: 0.0.0.0

routers: 
  - label: h2-in
    protocol: h2
    experimental: true
    client:
      initialStreamWindowBytes: 1048576
      failureAccrual:
        kind: none
    servers:
      - port: 4143
        ip: 0.0.0.0
        maxConcurrentStreamsPerConnection: 2147483647
        initialStreamWindowBytes: 1048576
    dtab: |
       /svc/* => /$/inet/127.0.0.1/9999;
    identifier:
      kind: io.l5d.header.path
      segments: 1

In addition to using strest-grpc, I also wrote a custom but crude benchmarking tool of my own, echobench, so that I could experiment with gRPC channel settings and socket options.

The custom tool also reports a significant amount of overhead when using linkerd:

Direct

=== summary ===
threads: 10 (5000 reqs per thread)
requests: 50000
throughput: 2922.804486159504/s
errors: 0
latency:
- min: 1ms
- max: 17ms
- median: 2.0ms
- avg: 2.8737975505112257ms
- p95: 7.0ms
- p99: 11.0ms

Via Linkerd

=== summary ===
threads: 10 (5000 reqs per thread)
requests: 50000
throughput: 629.5124804747321/s
errors: 0
latency:
- min: 6ms
- max: 34ms
- median: 13.0ms
- avg: 13.981156425923315ms
- p95: 20.0ms
- p99: 27.0ms

Checking the metrics.json endpoint after a test run shows an interesting discrepancy between request and stream latencies. Stream latencies appear to be much higher:

Request latencies

  "rt/h2-in/client/$/inet/127.0.0.1/9999/service/svc/echo.EchoService/request_latency_ms.count": 53932,
  "rt/h2-in/client/$/inet/127.0.0.1/9999/service/svc/echo.EchoService/request_latency_ms.max": 54,
  "rt/h2-in/client/$/inet/127.0.0.1/9999/service/svc/echo.EchoService/request_latency_ms.min": 0,
  "rt/h2-in/client/$/inet/127.0.0.1/9999/service/svc/echo.EchoService/request_latency_ms.p50": 6,
  "rt/h2-in/client/$/inet/127.0.0.1/9999/service/svc/echo.EchoService/request_latency_ms.p90": 16,
  "rt/h2-in/client/$/inet/127.0.0.1/9999/service/svc/echo.EchoService/request_latency_ms.p95": 22,
  "rt/h2-in/client/$/inet/127.0.0.1/9999/service/svc/echo.EchoService/request_latency_ms.p99": 31,
  "rt/h2-in/client/$/inet/127.0.0.1/9999/service/svc/echo.EchoService/request_latency_ms.p9990": 41,
  "rt/h2-in/client/$/inet/127.0.0.1/9999/service/svc/echo.EchoService/request_latency_ms.p9999": 50,
  "rt/h2-in/client/$/inet/127.0.0.1/9999/service/svc/echo.EchoService/request_latency_ms.sum": 390436,
  "rt/h2-in/client/$/inet/127.0.0.1/9999/service/svc/echo.EchoService/request_latency_ms.avg": 7.239412593636431,

Stream latencies

  "rt/h2-in/client/$/inet/127.0.0.1/9999/service/svc/echo.EchoService/response/stream/stream_duration_ms.count": 53945,
  "rt/h2-in/client/$/inet/127.0.0.1/9999/service/svc/echo.EchoService/response/stream/stream_duration_ms.max": 139,
  "rt/h2-in/client/$/inet/127.0.0.1/9999/service/svc/echo.EchoService/response/stream/stream_duration_ms.min": 10,
  "rt/h2-in/client/$/inet/127.0.0.1/9999/service/svc/echo.EchoService/response/stream/stream_duration_ms.p50": 46,
  "rt/h2-in/client/$/inet/127.0.0.1/9999/service/svc/echo.EchoService/response/stream/stream_duration_ms.p90": 59,
  "rt/h2-in/client/$/inet/127.0.0.1/9999/service/svc/echo.EchoService/response/stream/stream_duration_ms.p95": 63,
  "rt/h2-in/client/$/inet/127.0.0.1/9999/service/svc/echo.EchoService/response/stream/stream_duration_ms.p99": 79,
  "rt/h2-in/client/$/inet/127.0.0.1/9999/service/svc/echo.EchoService/response/stream/stream_duration_ms.p9990": 104,
  "rt/h2-in/client/$/inet/127.0.0.1/9999/service/svc/echo.EchoService/response/stream/stream_duration_ms.p9999": 135,
  "rt/h2-in/client/$/inet/127.0.0.1/9999/service/svc/echo.EchoService/response/stream/stream_duration_ms.sum": 2538980,
  "rt/h2-in/client/$/inet/127.0.0.1/9999/service/svc/echo.EchoService/response/stream/stream_duration_ms.avg": 47.06608582815831,

Another interesting peculiarity is there is only ever one stream indicated as open back to the test service, despite several threads sending requests simultaneously:

"rt/h2-in/client/$/inet/127.0.0.1/9999/stream/open_streams": 1

After discovering the overhead I attempted several configuration and code changes to alleviate this issue, including:

  • Manually defining max concurrent stream values to maximum values on both ends of the connection
  • Manually expanding HTTP/2 flow control window sizes to maximum allowable values on both ends of the connection
  • Setting the retry buffer sizes to 0
  • Removing the ClassifiedRetries module from the finagle client stack entirely in code
  • Manually expanding the maximum frame size to maximum allowable values
  • Removing the DelayedReleaseService from the finagle client stack in code

None of these changes made a noticeable impact on the latency and throughput.

I took several tcpdumps to examine the behavior of the service communication under both scenarios. A big difference number of captured packets immediately jumps out: 2966 for direct and 22542 for linkerd.

Looking at the tcpdumps in Wireshark confirms why: when communicating directly the client is able to multiplex HTTP/2 frames from ~120 streams into a single TCP packet. Linkerd, by comparison, creates a TCP packet for each HTTP/2 stream frame.

Linkerd packets

image

Direct packets

image

The discrepancy in latency and throughput could be attributed to the additional syscall and packetization overhead required to forward HTTP/2 traffic in this way.

I found the discrepancy surprising given that both linkerd and grpc-java are based on the netty4 HTTP/2 stack primitives. It turns out that when the grpc-java stack was being developed, they identified a problem with netty4’s H2 stack flushing too frequently.

They identified and fixed the issue in these places:

The benchmarks listed in the write queue PR indicate this change unlocked a significant jump in throughput, especially in the case of many streams (16806.718 ops/s versus 55975.008 ops/s for 1000 streams).

I don’t have an easy way to confirm how much a write queue will reduce the performance overhead I’m observing. I’m open to suggestions for a quick and dirty way to get more confidence in this as a potential solution.

I did try enabling Nagle’s (removing the TCP_NODELAY option) from the client socket in linkerd and after a few subsequent test runs was able to get the throughput to increase in a single threaded use-case.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:7
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
wmorgancommented, Sep 19, 2018

Wow. Thank you for the excellent writeup!

0reactions
evhfla-zzcommented, Sep 26, 2018

Thanks for the quick reply @zackangelo …hoping that Alex can have a fix soon.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Introduction to HTTP/2 - web.dev
The primary goals for HTTP/2 are to reduce latency by enabling full request and response multiplexing, minimize protocol overhead via efficient ...
Read more >
Improve throughput and concurrency with HTTP/2 - Vespa Blog
HTTP/2 allows more efficient network usage, with features like header compression, which reduces overall traffic and latency; and multiple, ...
Read more >
HTTP/2 - High Performance Browser Networking (O'Reilly)
The primary goals for HTTP/2 are to reduce latency by enabling full request and response multiplexing, minimize protocol overhead via efficient compression ......
Read more >
12. HTTP/2 - High Performance Browser Networking [Book]
The primary goals for HTTP/2 are to reduce latency by enabling full request and response multiplexing, minimize protocol overhead via efficient compression ......
Read more >
Performance benefit of http/2 over http for single request
Data download (from server to client) has a slight overhead for HTTP/2 because each DATA frame has a 9 octets overhead that may...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found