Benchmark scenario "dotnet_protobuf_async_unary_ping_pong_1MB" is highly flaky
See original GitHub issueAlso see internal bug b/253283712.
dotnet_protobuf_async_unary_ping_pong_1MB
is run with grpc_e2e_performance_gke] on both 30-core machine and 8-core machines
Since https://source.cloud.google.com/results/invocations/38c6078b-4413-488b-81a9-22a71226dcc8/targets (the job was broken before) there are multiple failures of this test.
It failed typically with driver process hang and the loadtest eventually timed out.
Issue Analytics
- State:
- Created a year ago
- Comments:21 (16 by maintainers)
Top Results From Across the Web
No results found
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’ve managed to capture a network trace.
What should happen is that the client sends 1MB of data to the server and the server sends it straight back unchanged - this is via a UnaryCall.
When the hang occurs the client sends the data, the server responds with a HTTP2 header of
status: 200 OK
and starts send the response data, but all goes quiet on the network before the server has sent back all of the data:I would be expecting a lot more
DATA[377]
packets to be sent by the server, which is the case for successful responses earlier in the test.FYI I’m managed to reproduce the hanging on both Linux and Windows.
I run the qps driver in a loop. On my Linux VM I can get it to fail after between 2 and 50 iterations. On Windows laptop failed after 94 iterations.
I’ll investigate further.