"GOAWAY received","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":1083,"http2_error":11,"raw_bytes":"too_many_pings"
See original GitHub issueWe are running streaming grpc connection to transfer a lot of messages and data. We constantly see that the gRPC server fails after sometime with “too_many_pings” errors.
The error is very consistent happening roughly at the same time and data volume.
Code for server creation https://github.com/apache/beam/blob/04a647d8c90ecf70d77aaa9f16a93f3cc568f9d1/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/ServerFactory.java#L247
Code for client creation https://github.com/apache/beam/blob/04a647d8c90ecf70d77aaa9f16a93f3cc568f9d1/sdks/python/apache_beam/runners/worker/data_plane.py#L353
Below are the relevant grpc debug logs. Please let me know if more logs are needed.
What version of gRPC are you using?
Server: Java 1.13.1 Client: Python 1.12.0
What did you expect to see?
Long running GRPC connection with no error. Relevant log on server
2018-12-19 11:41:51,017 DEBUG org.apache.beam.vendor.grpc.v1_13_1.io.grpc.netty.NettyServerHandler - [id: 0xd975c621, L:/0:0:0:0:0:0:0:1:37459 - R:/0:0:0:0:0:0:0:1:54512] INBOUND PING: ack=false bytes=10
2018-12-19 11:41:51,017 DEBUG org.apache.beam.vendor.grpc.v1_13_1.io.grpc.netty.NettyServerHandler - [id: 0xd975c621, L:/0:0:0:0:0:0:0:1:37459 - R:/0:0:0:0:0:0:0:1:54512] OUTBOUND PING: ack=true bytes=10
2018-12-19 11:41:51,017 DEBUG org.apache.beam.vendor.grpc.v1_13_1.io.grpc.netty.NettyServerHandler - [id: 0xd975c621, L:/0:0:0:0:0:0:0:1:37459 - R:/0:0:0:0:0:0:0:1:54512] OUTBOUND GO_AWAY: lastStreamId=1 errorCode=11 length=14 bytes=746f6f5f6d616e795f70696e6773
2018-12-19 11:41:51,031 DEBUG org.apache.beam.vendor.grpc.v1_13_1.io.grpc.netty.NettyServerHandler - [id: 0xd975c621, L:/0:0:0:0:0:0:0:1:37459 ! R:/0:0:0:0:0:0:0:1:54512] OUTBOUND RST_STREAM: streamId=1 errorCode=8
2018-12-19 11:41:51,031 ERROR org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer - Failed to handle for unknown endpoint
org.apache.beam.vendor.grpc.v1_13_1.io.grpc.StatusRuntimeException: CANCELLED: cancelled before receiving half close
at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.Status.asRuntimeException(Status.java:517)
at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onCancel(ServerCalls.java:272)
at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.PartialForwardingServerCallListener.onCancel(PartialForwardingServerCallListener.java:40)
at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.ForwardingServerCallListener.onCancel(ForwardingServerCallListener.java:23)
at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onCancel(ForwardingServerCallListener.java:40)
at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.Contexts$ContextualizedServerCallListener.onCancel(Contexts.java:96)
at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.closed(ServerCallImpl.java:293)
at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1Closed.runInContext(ServerImpl.java:738)
at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at org.apache.beam.vendor.grpc.v1_13_1.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Relevant logs on client
I1219 20:16:42.822471431 220 ev_epoll1_linux.cc:928] PS:0x55beb3b22bc8 END_WORKER:0x7f84ee563d90
I1219 20:16:42.822475861 62 call_combiner.cc:120] checking queue
I1219 20:16:42.822503665 220 ev_epoll1_linux.cc:908] .. mark pollset 0x55beb3b22bc8 inactive
I1219 20:16:42.822517153 62 call_combiner.cc:134] EXECUTING FROM QUEUE: closure=0x7f84c00030c0 error="No Error"
I1219 20:16:42.822527045 220 ev_epoll1_linux.cc:879] .. choose next poller to be 0x7f84d11f5d90
I1219 20:16:42.822540820 62 call_combiner.cc:69] ==> grpc_call_combiner_start() [0x7f84c0001e60] closure=0x7f84c00030f8 [continue recv_trailing_metadata] error="No Error"
I1219 20:16:42.822564311 220 ev_epoll1_linux.cc:990] .. remove worker
I1219 20:16:42.822571738 58 ev_epoll1_linux.cc:836] PS:0x55beb3695478 BEGIN_DONE:0x7f84d11f5d90 kick_state=DESIGNATED_POLLER shutdown=0 kicked_without_poller: 0
I1219 20:16:42.822564482 62 call_combiner.cc:78] size: 1 -> 2
I1219 20:16:42.822600785 220 completion_queue.cc:1063] RETURN_EVENT[0x55beb3b22b00]: OP_COMPLETE: tag:0x7f84f78f3d70 OK
I1219 20:16:42.822614871 62 call_combiner.cc:93] QUEUING
I1219 20:16:42.822636361 62 call_combiner.cc:106] ==> grpc_call_combiner_stop() [0x7f84c0001e60] [recv_initial_metadata_ready]
I1219 20:16:42.822653326 62 call_combiner.cc:113] size: 2 -> 1
I1219 20:16:42.822668732 62 call_combiner.cc:120] checking queue
I1219 20:16:42.822687261 62 call_combiner.cc:134] EXECUTING FROM QUEUE: closure=0x7f84c00030f8 error="No Error"
I1219 20:16:42.822711978 62 completion_queue.cc:699] cq_end_op_for_next(cq=0x55beb3b22b00, tag=0x7f83ee0befb0, error="No Error", done=0x7f84f835d3f0, done_arg=0x7f84c0004390, storage=0x7f84c00043e0)
D1219 20:16:42.822741423 62 ev_epoll1_linux.cc:1081] PS:0x55beb3b22bc8 KICK:(nil) curps=0x7f84bc003258 curworker=(nil) root=(nil)
I1219 20:16:42.822757894 62 ev_epoll1_linux.cc:1092] .. kicked_without_poller
I1219 20:16:42.822778130 62 call_combiner.cc:106] ==> grpc_call_combiner_stop() [0x7f84c0001e60] [recv_trailing_metadata_ready]
I1219 20:16:42.822796773 62 call_combiner.cc:113] size: 1 -> 0
I1219 20:16:42.822816656 62 call_combiner.cc:141] queue empty
D1219 20:16:42.822844119 62 call.cc:719] set_final_status CLI
D1219 20:16:42.822876527 62 call.cc:720] {"created":"@1545250602.822833237","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1036,"grpc_message":"Socket closed","grpc_status":14}
I1219 20:16:42.822907338 62 completion_queue.cc:699] cq_end_op_for_next(cq=0x55beb3b22b00, tag=0x7f83ee245470, error="No Error", done=0x7f84f835d3f0, done_arg=0x7f84c0002d20, storage=0x7f84c0002d70)
D1219 20:16:42.822931677 62 flow_control.cc:113] 0x7f84c000af18[0][cli] | t updt sent | trw: 1047380, ttw: 4194309, taw: 4194309, srw: , slw: , saw:
I1219 20:16:42.822955643 62 chttp2_transport.cc:852] W:0x7f84c00098d0 CLIENT state WRITING -> WRITING [begin write in current thread]
I1219 20:16:42.823004118 220 completion_queue.cc:963] grpc_completion_queue_next(cq=0x55beb3b22b00, deadline=gpr_timespec { tv_sec: 1545250603, tv_nsec: 22984696, clock_type: 1 }, reserved=(nil))
I1219 20:16:42.823046652 58 ev_epoll1_linux.cc:728] ps: 0x55beb3695478 poll got 1 events
I1219 20:16:42.823048633 62 subchannel_list.h:447] [pick_first 0x7f84c40024e0] subchannel list 0x7f84c4013ff0 index 0 of 2 (subchannel 0x7f84c4003800): connectivity changed: state=TRANSIENT_FAILURE, error={"created":"@1545250602.821364969","description":"GOAWAY received","file":"src/core/ext/transport/chttp
2/transport/chttp2_transport.cc","file_line":1147,"http2_error":11,"raw_bytes":"too_many_pings"}, shutting_down=0
D1219 20:16:42.823063371 220 ev_epoll1_linux.cc:1081] PS:0x55beb3b22bc8 KICK:(nil) curps=(nil) curworker=(nil) root=(nil)
E1219 20:16:42.823092992 58 lockfree_event.cc:213] LockfreeEvent::SetReady: 0x7f84c4000c88 curr=(nil)
I1219 20:16:42.823101846 62 pick_first.cc:461] Pick First 0x7f84c40024e0 connectivity changed for selected subchannel
I1219 20:16:42.823121047 62 connectivity_state.cc:164] SET: 0x7f84c40026c8 pick_first: READY --> IDLE [selected_changed+reresolve] error=(nil) "No Error"
I1219 20:16:42.823114973 220 ev_epoll1_linux.cc:1092] .. kicked_without_poller
E1219 20:16:42.823115591 58 lockfree_event.cc:213] LockfreeEvent::SetReady: 0x7f84c4000c78 curr=0x2
I1219 20:16:42.823137959 62 connectivity_state.cc:190] NOTIFY: 0x7f84c40026c8 pick_first: 0x7f84c0007c18
I1219 20:16:42.823150194 220 completion_queue.cc:1063] RETURN_EVENT[0x55beb3b22b00]: OP_COMPLETE: tag:0x7f83ee0befb0 OK
E1219 20:16:42.823167604 58 lockfree_event.cc:213] LockfreeEvent::SetReady: 0x7f84c4000c80 curr=0x2
I1219 20:16:42.823177467 62 lb_policy.cc:47] pick_first 0x7f84c40024e0: scheduling re-resolution closure with error="No Error".
I1219 20:16:42.823189882 58 ev_epoll1_linux.cc:928] PS:0x55beb3695478 END_WORKER:0x7f84d11f5d90
I1219 20:16:42.823195980 62 subchannel_list.h:374] [pick_first 0x7f84c40024e0] subchannel list 0x7f84c4013ff0 index 0 of 2 (subchannel 0x7f84c4003800): stopping connectivity watch
I1219 20:16:42.823220322 58 ev_epoll1_linux.cc:908] .. mark pollset 0x55beb3695478 inactive
I1219 20:16:42.823234335 62 client_channel.cc:214] chand=0x55beb3b09e78: lb_policy=0x7f84c40024e0 state changed to IDLE
I1219 20:16:42.823238624 58 ev_epoll1_linux.cc:990] .. remove worker
I1219 20:16:42.823261347 62 client_channel.cc:196] chand=0x55beb3b09e78: setting connectivity state to IDLE
I1219 20:16:42.823281386 58 ev_epoll1_linux.cc:748] PS:0x55beb3695478 BEGIN_STARTS:0x7f84d11f5d90
I1219 20:16:42.823298138 62 connectivity_state.cc:164] SET: 0x55beb3b09f00 client_channel: READY --> IDLE [lb_changed] error=(nil) "No Error"
I1219 20:16:42.823298643 58 ev_epoll1_linux.cc:767] PS:0x55beb3695478 BEGIN_REORG:0x7f84d11f5d90 kick_state=UNKICKED is_reassigning=1
I1219 20:16:42.823340813 62 connectivity_state.cc:116] CONWATCH: 0x7f84c40026c8 pick_first: from IDLE [cur=IDLE] notify=0x7f8490001b48
I1219 20:16:42.823354538 58 ev_epoll1_linux.cc:836] PS:0x55beb3695478 BEGIN_DONE:0x7f84d11f5d90 kick_state=DESIGNATED_POLLER shutdown=0 kicked_without_poller: 0
I1219 20:16:42.823358721 62 client_channel.cc:306] chand=0x55beb3b09e78: started name re-resolving
D1219 20:16:42.823394577 62 dns_resolver.cc:279] Start resolving.
I1219 20:16:42.823418368 62 executor.cc:243] EXECUTOR (resolver-executor) try to schedule 0x7f8490001000 (short) to thread 0
I1219 20:16:42.823451488 62 chttp2_transport.cc:852] W:0x7f84c00098d0 CLIENT state WRITING -> IDLE [finish writing]
I1219 20:16:42.823470502 44 executor.cc:188] EXECUTOR (resolver-executor) [0]: execute
I1219 20:16:42.823473482 62 chttp2_transport.cc:2790] set connectivity_state=4
I1219 20:16:42.823518606 44 executor.cc:73] EXECUTOR (resolver-executor) run 0x7f8490001000
I1219 20:16:42.823586626 62 connectivity_state.cc:164] SET: 0x7f84c0009b78 client_transport: TRANSIENT_FAILURE --> SHUTDOWN [close_transport] error=0x7f84900014f0 {"created":"@1545250602.822223145","description":"Delayed close due to in-progress write","file":"src/core/ext/transport/chttp2/transport/chttp2_tran
sport.cc","file_line":609,"referenced_errors":[{"created":"@1545250602.821928707","description":"Endpoint read failed","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":2510,"occurred_during_write":0,"referenced_errors":[{"created":"@1545250602.821703024","description":"Socket closed","
fd":6,"file":"src/core/lib/iomgr/tcp_posix.cc","file_line":465,"grpc_status":14,"target_address":"ipv6:[::1]:44439"},{"created":"@1545250602.821364969","description":"GOAWAY received","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":1147,"http2_error":11,"raw_bytes":"too_many_pings"}]}
]}
I1219 20:16:42.823608583 62 timer_generic.cc:467] TIMER 0x7f84c000b128: CANCEL pending=true
I1219 20:16:42.823626241 62 ev_posix.cc:269] (fd-trace) fd_shutdown(6)
E1219 20:16:42.823656433 62 lockfree_event.cc:164] LockfreeEvent::SetShutdown: 0x7f84c4000c78 curr=0x2 err={"created":"@1545250602.822223145","description":"Delayed close due to in-progress write","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":609,"referenced_errors":[{"cre
ated":"@1545250602.821928707","description":"Endpoint read failed","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":2510,"occurred_during_write":0,"referenced_errors":[{"created":"@1545250602.821703024","description":"Socket closed","fd":6,"file":"src/core/lib/iomgr/tcp_posix.cc","file
_line":465,"grpc_status":14,"target_address":"ipv6:[::1]:44439"},{"created":"@1545250602.821364969","description":"GOAWAY received","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":1147,"http2_error":11,"raw_bytes":"too_many_pings"}]}]}
E1219 20:16:42.823691701 62 lockfree_event.cc:164] LockfreeEvent::SetShutdown: 0x7f84c4000c80 curr=0x2 err={"created":"@1545250602.822223145","description":"Delayed close due to in-progress write","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":609,"referenced_errors":[{"cre
ated":"@1545250602.821928707","description":"Endpoint read failed","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":2510,"occurred_during_write":0,"referenced_errors":[{"created":"@1545250602.821703024","description":"Socket closed","fd":6,"file":"src/core/lib/iomgr/tcp_posix.cc","file
_line":465,"grpc_status":14,"target_address":"ipv6:[::1]:44439"},{"created":"@1545250602.821364969","description":"GOAWAY received","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":1147,"http2_error":11,"raw_bytes":"too_many_pings"}]}]}
I1219 20:16:42.823703027 58 ev_epoll1_linux.cc:728] ps: 0x55beb3695478 poll got 1 events
E1219 20:16:42.823713652 62 lockfree_event.cc:164] LockfreeEvent::SetShutdown: 0x7f84c4000c88 curr=0x2 err={"created":"@1545250602.822223145","description":"Delayed close due to in-progress write","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":609,"referenced_errors":[{"cre
ated":"@1545250602.821928707","description":"Endpoint read failed","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":2510,"occurred_during_write":0,"referenced_errors":[{"created":"@1545250602.821703024","description":"Socket closed","fd":6,"file":"src/core/lib/iomgr/tcp_posix.cc","file
_line":465,"grpc_status":14,"target_address":"ipv6:[::1]:44439"},{"created":"@1545250602.821364969","description":"GOAWAY received","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":1147,"http2_error":11,"raw_bytes":"too_many_pings"}]}]}
Issue Analytics
- State:
- Created 5 years ago
- Comments:16 (9 by maintainers)
Top GitHub Comments
@pkwarren, sent out #5795 to remove the API.
grpc/grpc#17667 has been resolved. That should fix this issue.