question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"GOAWAY received","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":1083,"http2_error":11,"raw_bytes":"too_many_pings"

See original GitHub issue

We are running streaming grpc connection to transfer a lot of messages and data. We constantly see that the gRPC server fails after sometime with “too_many_pings” errors.

The error is very consistent happening roughly at the same time and data volume.

Code for server creation https://github.com/apache/beam/blob/04a647d8c90ecf70d77aaa9f16a93f3cc568f9d1/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/ServerFactory.java#L247

Code for client creation https://github.com/apache/beam/blob/04a647d8c90ecf70d77aaa9f16a93f3cc568f9d1/sdks/python/apache_beam/runners/worker/data_plane.py#L353

Below are the relevant grpc debug logs. Please let me know if more logs are needed.

What version of gRPC are you using?

Server: Java 1.13.1 Client: Python 1.12.0

What did you expect to see?

Long running GRPC connection with no error. Relevant log on server

2018-12-19 11:41:51,017 DEBUG org.apache.beam.vendor.grpc.v1_13_1.io.grpc.netty.NettyServerHandler  - [id: 0xd975c621, L:/0:0:0:0:0:0:0:1:37459 - R:/0:0:0:0:0:0:0:1:54512] INBOUND PING: ack=false bytes=10
2018-12-19 11:41:51,017 DEBUG org.apache.beam.vendor.grpc.v1_13_1.io.grpc.netty.NettyServerHandler  - [id: 0xd975c621, L:/0:0:0:0:0:0:0:1:37459 - R:/0:0:0:0:0:0:0:1:54512] OUTBOUND PING: ack=true bytes=10
2018-12-19 11:41:51,017 DEBUG org.apache.beam.vendor.grpc.v1_13_1.io.grpc.netty.NettyServerHandler  - [id: 0xd975c621, L:/0:0:0:0:0:0:0:1:37459 - R:/0:0:0:0:0:0:0:1:54512] OUTBOUND GO_AWAY: lastStreamId=1 errorCode=11 length=14 bytes=746f6f5f6d616e795f70696e6773
2018-12-19 11:41:51,031 DEBUG org.apache.beam.vendor.grpc.v1_13_1.io.grpc.netty.NettyServerHandler  - [id: 0xd975c621, L:/0:0:0:0:0:0:0:1:37459 ! R:/0:0:0:0:0:0:0:1:54512] OUTBOUND RST_STREAM: streamId=1 errorCode=8
2018-12-19 11:41:51,031 ERROR org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer         - Failed to handle for unknown endpoint
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.StatusRuntimeException: CANCELLED: cancelled before receiving half close
        at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.Status.asRuntimeException(Status.java:517)
        at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onCancel(ServerCalls.java:272)
        at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.PartialForwardingServerCallListener.onCancel(PartialForwardingServerCallListener.java:40)
        at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.ForwardingServerCallListener.onCancel(ForwardingServerCallListener.java:23)
        at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onCancel(ForwardingServerCallListener.java:40)
        at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.Contexts$ContextualizedServerCallListener.onCancel(Contexts.java:96)
        at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.closed(ServerCallImpl.java:293)
        at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1Closed.runInContext(ServerImpl.java:738)
        at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Relevant logs on client

I1219 20:16:42.822471431     220 ev_epoll1_linux.cc:928]     PS:0x55beb3b22bc8 END_WORKER:0x7f84ee563d90
I1219 20:16:42.822475861      62 call_combiner.cc:120]         checking queue
I1219 20:16:42.822503665     220 ev_epoll1_linux.cc:908]      .. mark pollset 0x55beb3b22bc8 inactive
I1219 20:16:42.822517153      62 call_combiner.cc:134]         EXECUTING FROM QUEUE: closure=0x7f84c00030c0 error="No Error"
I1219 20:16:42.822527045     220 ev_epoll1_linux.cc:879]      .. choose next poller to be 0x7f84d11f5d90
I1219 20:16:42.822540820      62 call_combiner.cc:69]        ==> grpc_call_combiner_start() [0x7f84c0001e60] closure=0x7f84c00030f8 [continue recv_trailing_metadata] error="No Error"
I1219 20:16:42.822564311     220 ev_epoll1_linux.cc:990]      .. remove worker
I1219 20:16:42.822571738      58 ev_epoll1_linux.cc:836]     PS:0x55beb3695478 BEGIN_DONE:0x7f84d11f5d90 kick_state=DESIGNATED_POLLER shutdown=0 kicked_without_poller: 0
I1219 20:16:42.822564482      62 call_combiner.cc:78]          size: 1 -> 2
I1219 20:16:42.822600785     220 completion_queue.cc:1063]   RETURN_EVENT[0x55beb3b22b00]: OP_COMPLETE: tag:0x7f84f78f3d70 OK
I1219 20:16:42.822614871      62 call_combiner.cc:93]          QUEUING
I1219 20:16:42.822636361      62 call_combiner.cc:106]       ==> grpc_call_combiner_stop() [0x7f84c0001e60] [recv_initial_metadata_ready]
I1219 20:16:42.822653326      62 call_combiner.cc:113]         size: 2 -> 1
I1219 20:16:42.822668732      62 call_combiner.cc:120]         checking queue
I1219 20:16:42.822687261      62 call_combiner.cc:134]         EXECUTING FROM QUEUE: closure=0x7f84c00030f8 error="No Error"
I1219 20:16:42.822711978      62 completion_queue.cc:699]    cq_end_op_for_next(cq=0x55beb3b22b00, tag=0x7f83ee0befb0, error="No Error", done=0x7f84f835d3f0, done_arg=0x7f84c0004390, storage=0x7f84c00043e0)
D1219 20:16:42.822741423      62 ev_epoll1_linux.cc:1081]    PS:0x55beb3b22bc8 KICK:(nil) curps=0x7f84bc003258 curworker=(nil) root=(nil)
I1219 20:16:42.822757894      62 ev_epoll1_linux.cc:1092]     .. kicked_without_poller
I1219 20:16:42.822778130      62 call_combiner.cc:106]       ==> grpc_call_combiner_stop() [0x7f84c0001e60] [recv_trailing_metadata_ready]
I1219 20:16:42.822796773      62 call_combiner.cc:113]         size: 1 -> 0
I1219 20:16:42.822816656      62 call_combiner.cc:141]         queue empty
D1219 20:16:42.822844119      62 call.cc:719]                set_final_status CLI
D1219 20:16:42.822876527      62 call.cc:720]                {"created":"@1545250602.822833237","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1036,"grpc_message":"Socket closed","grpc_status":14}
I1219 20:16:42.822907338      62 completion_queue.cc:699]    cq_end_op_for_next(cq=0x55beb3b22b00, tag=0x7f83ee245470, error="No Error", done=0x7f84f835d3f0, done_arg=0x7f84c0002d20, storage=0x7f84c0002d70)
D1219 20:16:42.822931677      62 flow_control.cc:113]        0x7f84c000af18[0][cli] | t updt sent | trw:                       1047380, ttw:                       4194309, taw:                       4194309, srw:                              , slw:                              , saw:                              
I1219 20:16:42.822955643      62 chttp2_transport.cc:852]    W:0x7f84c00098d0 CLIENT state WRITING -> WRITING [begin write in current thread]
I1219 20:16:42.823004118     220 completion_queue.cc:963]    grpc_completion_queue_next(cq=0x55beb3b22b00, deadline=gpr_timespec { tv_sec: 1545250603, tv_nsec: 22984696, clock_type: 1 }, reserved=(nil))
I1219 20:16:42.823046652      58 ev_epoll1_linux.cc:728]     ps: 0x55beb3695478 poll got 1 events
I1219 20:16:42.823048633      62 subchannel_list.h:447]      [pick_first 0x7f84c40024e0] subchannel list 0x7f84c4013ff0 index 0 of 2 (subchannel 0x7f84c4003800): connectivity changed: state=TRANSIENT_FAILURE, error={"created":"@1545250602.821364969","description":"GOAWAY received","file":"src/core/ext/transport/chttp
2/transport/chttp2_transport.cc","file_line":1147,"http2_error":11,"raw_bytes":"too_many_pings"}, shutting_down=0
D1219 20:16:42.823063371     220 ev_epoll1_linux.cc:1081]    PS:0x55beb3b22bc8 KICK:(nil) curps=(nil) curworker=(nil) root=(nil)
E1219 20:16:42.823092992      58 lockfree_event.cc:213]      LockfreeEvent::SetReady: 0x7f84c4000c88 curr=(nil)
I1219 20:16:42.823101846      62 pick_first.cc:461]          Pick First 0x7f84c40024e0 connectivity changed for selected subchannel
I1219 20:16:42.823121047      62 connectivity_state.cc:164]  SET: 0x7f84c40026c8 pick_first: READY --> IDLE [selected_changed+reresolve] error=(nil) "No Error"
I1219 20:16:42.823114973     220 ev_epoll1_linux.cc:1092]     .. kicked_without_poller
E1219 20:16:42.823115591      58 lockfree_event.cc:213]      LockfreeEvent::SetReady: 0x7f84c4000c78 curr=0x2
I1219 20:16:42.823137959      62 connectivity_state.cc:190]  NOTIFY: 0x7f84c40026c8 pick_first: 0x7f84c0007c18
I1219 20:16:42.823150194     220 completion_queue.cc:1063]   RETURN_EVENT[0x55beb3b22b00]: OP_COMPLETE: tag:0x7f83ee0befb0 OK
E1219 20:16:42.823167604      58 lockfree_event.cc:213]      LockfreeEvent::SetReady: 0x7f84c4000c80 curr=0x2
I1219 20:16:42.823177467      62 lb_policy.cc:47]            pick_first 0x7f84c40024e0: scheduling re-resolution closure with error="No Error".
I1219 20:16:42.823189882      58 ev_epoll1_linux.cc:928]     PS:0x55beb3695478 END_WORKER:0x7f84d11f5d90
I1219 20:16:42.823195980      62 subchannel_list.h:374]      [pick_first 0x7f84c40024e0] subchannel list 0x7f84c4013ff0 index 0 of 2 (subchannel 0x7f84c4003800): stopping connectivity watch
I1219 20:16:42.823220322      58 ev_epoll1_linux.cc:908]      .. mark pollset 0x55beb3695478 inactive
I1219 20:16:42.823234335      62 client_channel.cc:214]      chand=0x55beb3b09e78: lb_policy=0x7f84c40024e0 state changed to IDLE
I1219 20:16:42.823238624      58 ev_epoll1_linux.cc:990]      .. remove worker
I1219 20:16:42.823261347      62 client_channel.cc:196]      chand=0x55beb3b09e78: setting connectivity state to IDLE
I1219 20:16:42.823281386      58 ev_epoll1_linux.cc:748]     PS:0x55beb3695478 BEGIN_STARTS:0x7f84d11f5d90
I1219 20:16:42.823298138      62 connectivity_state.cc:164]  SET: 0x55beb3b09f00 client_channel: READY --> IDLE [lb_changed] error=(nil) "No Error"
I1219 20:16:42.823298643      58 ev_epoll1_linux.cc:767]     PS:0x55beb3695478 BEGIN_REORG:0x7f84d11f5d90 kick_state=UNKICKED is_reassigning=1
I1219 20:16:42.823340813      62 connectivity_state.cc:116]  CONWATCH: 0x7f84c40026c8 pick_first: from IDLE [cur=IDLE] notify=0x7f8490001b48
I1219 20:16:42.823354538      58 ev_epoll1_linux.cc:836]     PS:0x55beb3695478 BEGIN_DONE:0x7f84d11f5d90 kick_state=DESIGNATED_POLLER shutdown=0 kicked_without_poller: 0
I1219 20:16:42.823358721      62 client_channel.cc:306]      chand=0x55beb3b09e78: started name re-resolving
D1219 20:16:42.823394577      62 dns_resolver.cc:279]        Start resolving.
I1219 20:16:42.823418368      62 executor.cc:243]            EXECUTOR (resolver-executor) try to schedule 0x7f8490001000 (short) to thread 0
I1219 20:16:42.823451488      62 chttp2_transport.cc:852]    W:0x7f84c00098d0 CLIENT state WRITING -> IDLE [finish writing]
I1219 20:16:42.823470502      44 executor.cc:188]            EXECUTOR (resolver-executor) [0]: execute
I1219 20:16:42.823473482      62 chttp2_transport.cc:2790]   set connectivity_state=4
I1219 20:16:42.823518606      44 executor.cc:73]             EXECUTOR (resolver-executor) run 0x7f8490001000
I1219 20:16:42.823586626      62 connectivity_state.cc:164]  SET: 0x7f84c0009b78 client_transport: TRANSIENT_FAILURE --> SHUTDOWN [close_transport] error=0x7f84900014f0 {"created":"@1545250602.822223145","description":"Delayed close due to in-progress write","file":"src/core/ext/transport/chttp2/transport/chttp2_tran
sport.cc","file_line":609,"referenced_errors":[{"created":"@1545250602.821928707","description":"Endpoint read failed","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":2510,"occurred_during_write":0,"referenced_errors":[{"created":"@1545250602.821703024","description":"Socket closed","
fd":6,"file":"src/core/lib/iomgr/tcp_posix.cc","file_line":465,"grpc_status":14,"target_address":"ipv6:[::1]:44439"},{"created":"@1545250602.821364969","description":"GOAWAY received","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":1147,"http2_error":11,"raw_bytes":"too_many_pings"}]}
]}
I1219 20:16:42.823608583      62 timer_generic.cc:467]       TIMER 0x7f84c000b128: CANCEL pending=true
I1219 20:16:42.823626241      62 ev_posix.cc:269]            (fd-trace) fd_shutdown(6)
E1219 20:16:42.823656433      62 lockfree_event.cc:164]      LockfreeEvent::SetShutdown: 0x7f84c4000c78 curr=0x2 err={"created":"@1545250602.822223145","description":"Delayed close due to in-progress write","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":609,"referenced_errors":[{"cre
ated":"@1545250602.821928707","description":"Endpoint read failed","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":2510,"occurred_during_write":0,"referenced_errors":[{"created":"@1545250602.821703024","description":"Socket closed","fd":6,"file":"src/core/lib/iomgr/tcp_posix.cc","file
_line":465,"grpc_status":14,"target_address":"ipv6:[::1]:44439"},{"created":"@1545250602.821364969","description":"GOAWAY received","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":1147,"http2_error":11,"raw_bytes":"too_many_pings"}]}]}
E1219 20:16:42.823691701      62 lockfree_event.cc:164]      LockfreeEvent::SetShutdown: 0x7f84c4000c80 curr=0x2 err={"created":"@1545250602.822223145","description":"Delayed close due to in-progress write","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":609,"referenced_errors":[{"cre
ated":"@1545250602.821928707","description":"Endpoint read failed","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":2510,"occurred_during_write":0,"referenced_errors":[{"created":"@1545250602.821703024","description":"Socket closed","fd":6,"file":"src/core/lib/iomgr/tcp_posix.cc","file
_line":465,"grpc_status":14,"target_address":"ipv6:[::1]:44439"},{"created":"@1545250602.821364969","description":"GOAWAY received","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":1147,"http2_error":11,"raw_bytes":"too_many_pings"}]}]}
I1219 20:16:42.823703027      58 ev_epoll1_linux.cc:728]     ps: 0x55beb3695478 poll got 1 events
E1219 20:16:42.823713652      62 lockfree_event.cc:164]      LockfreeEvent::SetShutdown: 0x7f84c4000c88 curr=0x2 err={"created":"@1545250602.822223145","description":"Delayed close due to in-progress write","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":609,"referenced_errors":[{"cre
ated":"@1545250602.821928707","description":"Endpoint read failed","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":2510,"occurred_during_write":0,"referenced_errors":[{"created":"@1545250602.821703024","description":"Socket closed","fd":6,"file":"src/core/lib/iomgr/tcp_posix.cc","file
_line":465,"grpc_status":14,"target_address":"ipv6:[::1]:44439"},{"created":"@1545250602.821364969","description":"GOAWAY received","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":1147,"http2_error":11,"raw_bytes":"too_many_pings"}]}]}

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:16 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
ejona86commented, May 28, 2019

@pkwarren, sent out #5795 to remove the API.

0reactions
ejona86commented, Sep 17, 2020

grpc/grpc#17667 has been resolved. That should fix this issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found