question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Linkerd times out requests following k8s pod deletion

See original GitHub issue

Summary

Using a modified Linkerd1 lifecycle environment (commit), observed slow-cooker requests timing out, with downstream Linked reporting CancelledRequestException followed by Failed mid-stream. Terminating stream, closing connection followed by ChannelClosedException.

This test environment deletes a bb-terminus pod every 30 seconds, while the upstream slow-cooker and bb-p2p-broadcast pods attempts to continuously make requests.

Lifecycle Requests

Steps to reproduce

Deploy

cat linkerd.yml | kubectl apply -f - && bin/deploy 1

Observe

Around 2018-09-26T17:29:37, slow-cooker requests start timing out (full log):

2018-09-26T17:29:27Z     10/0/0 10 100% 10s  18 [ 20  21  21   21 ]   21      0
2018-09-26T17:29:37Z      0/1/0 10  10% 10s  37 [ 37  37  37   37 ]   37      0
Get http://bb-p2p.lifecycle1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2018-09-26T17:29:47Z      0/0/1 10   0% 10s   0 [  0   0   0    0 ]    0      0 -
Get http://bb-p2p.lifecycle1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2018-09-26T17:29:57Z      0/0/1 10   0% 10s   0 [  0   0   0    0 ]    0      0 -

linkerd log

D 0926 17:29:29.275 UTC THREAD25: [S L:/10.233.66.68:4340 R:/10.233.66.77:39628 S:2501] stream closed
D 0926 17:29:39.267 UTC THREAD30 TraceId:92a26f0c24a89eff: Exception propagated to the default monitor (upstream address: /10.233.66.72:59644, downstream address: /10.233.66.68:4141, label: %/io.l5d.k8s.daemonset/linkerd/http-incoming/l5d/#/io.l5d.k8s/lifecycle1/http/bb-p2p).
com.twitter.finagle.CancelledRequestException: request cancelled. Remote Info: Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: /10.233.66.77:7070, Downstream label: %/io.l5d.k8s.localnode/10.233.66.68/#/io.l5d.k8s/lifecycle1/http/bb-p2p, Trace Id: 92a26f0c24a89eff.d7d924a2654069fb<:d844b6d81c2331f9

D 0926 17:29:39.268 UTC THREAD26 TraceId:92a26f0c24a89eff: Exception propagated to the default monitor (upstream address: /10.233.66.68:47320, downstream address: /10.233.66.77:7070, label: %/io.l5d.k8s.localnode/10.233.66.68/#/io.l5d.k8s/lifecycle1/http/bb-p2p).
com.twitter.finagle.CancelledRequestException: request cancelled. Remote Info: Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: /10.233.66.77:7070, Downstream label: %/io.l5d.k8s.localnode/10.233.66.68/#/io.l5d.k8s/lifecycle1/http/bb-p2p, Trace Id: 92a26f0c24a89eff.d7d924a2654069fb<:d844b6d81c2331f9

D 0926 17:29:39.271 UTC THREAD25 TraceId:92a26f0c24a89eff: Failed mid-stream. Terminating stream, closing connection
com.twitter.finagle.ChannelClosedException: ChannelException at remote address: /10.233.66.68:4141. Remote Info: Not Available
	at com.twitter.finagle.netty4.transport.ChannelTransport$$anon$2.channelInactive(ChannelTransport.scala:196)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
	at io.netty.handler.codec.MessageAggregator.channelInactive(MessageAggregator.java:417)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelInactive(CombinedChannelDuplexHandler.java:420)
	at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:377)
	at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:342)
	at io.netty.handler.codec.http.HttpClientCodec$Decoder.channelInactive(HttpClientCodec.java:281)
	at io.netty.channel.CombinedChannelDuplexHandler.channelInactive(CombinedChannelDuplexHandler.java:223)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
	at com.twitter.finagle.netty4.channel.ChannelRequestStatsHandler.channelInactive(ChannelRequestStatsHandler.scala:41)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
	at com.twitter.finagle.netty4.channel.ChannelStatsHandler.channelInactive(ChannelStatsHandler.scala:148)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1354)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:917)
	at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:822)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at com.twitter.finagle.util.BlockingTimeTrackingThreadFactory$$anon$1.run(BlockingTimeTrackingThreadFactory.scala:23)
	at java.lang.Thread.run(Thread.java:748)

D 0926 17:29:39.271 UTC THREAD27 TraceId:92a26f0c24a89eff: Failed mid-stream. Terminating stream, closing connection
com.twitter.finagle.ChannelClosedException: ChannelException at remote address: /10.233.66.77:7070. Remote Info: Not Available
	at com.twitter.finagle.netty4.transport.ChannelTransport$$anon$2.channelInactive(ChannelTransport.scala:196)

...

E 0926 17:29:39.275 UTC THREAD26 TraceId:92a26f0c24a89eff: service failure: com.twitter.finagle.CancelledRequestException: request cancelled. Remote Info: Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: /10.233.66.77:7070, Downstream label: %/io.l5d.k8s.localnode/10.233.66.68/#/io.l5d.k8s/lifecycle1/http/bb-p2p, Trace Id: 92a26f0c24a89eff.d7d924a2654069fb<:d844b6d81c2331f9
E 0926 17:29:39.275 UTC THREAD30 TraceId:92a26f0c24a89eff: service failure: com.twitter.finagle.CancelledRequestException: request cancelled. Remote Info: Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: /10.233.66.77:7070, Downstream label: %/io.l5d.k8s.localnode/10.233.66.68/#/io.l5d.k8s/lifecycle1/http/bb-p2p, Trace Id: 92a26f0c24a89eff.d7d924a2654069fb<:d844b6d81c2331f9
D 0926 17:29:39.287 UTC THREAD25: [S L:/10.233.66.68:4340 R:/10.233.66.77:39628 S:2503] initialized stream
D 0926 17:29:39.290 UTC THREAD25 TraceId:69fac7e51063f2d1: k8s lookup: /lifecycle1/grpc/bb-terminus1 /lifecycle1/grpc/bb-terminus1
D 0926 17:29:39.294 UTC THREAD25: [S L:/10.233.66.68:4340 R:/10.233.66.77:39628 S:2503] stream closed
D 0926 17:29:41.621 UTC THREAD29 TraceId:c8f97d6b63697a48: k8s ns lifecycle1 service bb-terminus1 modified endpoints
D 0926 17:29:41.630 UTC THREAD28 TraceId:630292ce49f848be: k8s ns lifecycle1 service bb-terminus2 modified endpoints
D 0926 17:29:49.263 UTC THREAD29 TraceId:f97a23c7f3f8ee43: Exception propagated to the default monitor (upstream address: /10.233.66.72:38982, downstream address: /10.233.66.68:4141, label: %/io.l5d.k8s.daemonset/linkerd/http-incoming/l5d/#/io.l5d.k8s/lifecycle1/http/bb-p2p).
com.twitter.finagle.CancelledRequestException: request cancelled. Remote Info: Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: /10.233.66.77:7070, Downstream label: %/io.l5d.k8s.localnode/10.233.66.68/#/io.l5d.k8s/lifecycle1/http/bb-p2p, Trace Id: f97a23c7f3f8ee43.898e458de51e998c<:ec154c3d40d22178

D 0926 17:29:49.263 UTC THREAD30 TraceId:f97a23c7f3f8ee43: Failed mid-stream. Terminating stream, closing connection
com.twitter.finagle.ChannelClosedException: ChannelException at remote address: /10.233.66.68:4141. Remote Info: Not Available
	at com.twitter.finagle.netty4.transport.ChannelTransport$$anon$2.channelInactive(ChannelTransport.scala:196)

linkerd metrics.json

https://gist.github.com/siggy/9b0b7784db5df929512631cffc010f85

linkerd client_state.json

https://gist.github.com/siggy/29ac1a5fd3c4639a42e05cc04f146042

linkerd namerd_state.json

https://gist.github.com/siggy/1180f911e2e16c929940b1dc1ef11e19

bb log

time="2018-09-26T17:29:28Z" level=info msg="Making request to [147.75.70.161:4340 / bb-terminus2.lifecycle1]"
time="2018-09-26T17:29:28Z" level=info msg="Making request to [147.75.70.161:4340 / bb-terminus1.lifecycle1]"
time="2018-09-26T17:29:28Z" level=error msg="Error when broadcasting request [requestUID:\"in:http-sid:broadcast-channel-grpc:-1-h1:7070-266590609\" ] to client [147.75.70.161:4340 / bb-terminus2.lifecycle1]: rpc error: code = Internal desc = transport: received the unexpected content-type \"text/plain\""
time="2018-09-26T17:29:28Z" level=error msg="Error when broadcasting request [requestUID:\"in:http-sid:broadcast-channel-grpc:-1-h1:7070-266590609\" ] to client [147.75.70.161:4340 / bb-terminus1.lifecycle1]: rpc error: code = Internal desc = transport: received the unexpected content-type \"text/plain\""
time="2018-09-26T17:29:28Z" level=info msg="Finished broadcast"
time="2018-09-26T17:29:28Z" level=error msg="Error while handling HTTP request: error handling http request: downstream server [147.75.70.161:4340 / bb-terminus2.lifecycle1] returned error: rpc error: code = Internal desc = transport: received the unexpected content-type \"text/plain\",downstream server [147.75.70.161:4340 / bb-terminus1.lifecycle1] returned error: rpc error: code = Internal desc = transport: received the unexpected content-type \"text/plain\""
time="2018-09-26T17:29:29Z" level=info msg="Received request with empty body, assigning new request UID [in:http-sid:broadcast-channel-grpc:-1-h1:7070-267164731] to it"
time="2018-09-26T17:29:29Z" level=debug msg="Received HTTP request [in:http-sid:broadcast-channel-grpc:-1-h1:7070-267164731] [&{Method:GET URL:/ Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[User-Agent:[Go-http-client/1.1] Sc-Req-Id:[1251] Via:[1.1 linkerd, 1.1 linkerd] L5d-Dst-Client:[/%/io.l5d.k8s.localnode/10.233.66.68/#/io.l5d.k8s/lifecycle1/http/bb-p2p] Content-Length:[0] L5d-Dst-Service:[/svc/bb-p2p.lifecycle1] L5d-Ctx-Trace:[19kkomVAafvYRLbYHCMx+ZKibwwkqJ7/AAAAAAAAAAA=] L5d-Reqid:[92a26f0c24a89eff]] Body:{} GetBody:<nil> ContentLength:0 TransferEncoding:[] Close:false Host:bb-p2p.lifecycle1 Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr:10.233.66.68:47720 RequestURI:/ TLS:<nil> Cancel:<nil> Response:<nil> ctx:0xc0005b9940}] Context [context.Background.WithValue(&http.contextKey{name:\"http-server\"}, &http.Server{Addr:\":7070\", Handler:(*protocols.httpHandler)(0xc00000e0a8), TLSConfig:(*tls.Config)(0xc0001d0000), ReadTimeout:0, ReadHeaderTimeout:0, WriteTimeout:0, IdleTimeout:0, MaxHeaderBytes:0, TLSNextProto:map[string]func(*http.Server, *tls.Conn, http.Handler){\"h2\":(func(*http.Server, *tls.Conn, http.Handler))(0x72a820)}, ConnState:(func(net.Conn, http.ConnState))(nil), ErrorLog:(*log.Logger)(nil), disableKeepAlives:0, inShutdown:0, nextProtoOnce:sync.Once{m:sync.Mutex{state:0, sema:0x0}, done:0x1}, nextProtoErr:error(nil), mu:sync.Mutex{state:0, sema:0x0}, listeners:map[*net.Listener]struct {}{(*net.Listener)(0xc0001a4080):struct {}{}}, activeConn:map[*http.conn]struct {}{(*http.conn)(0xc0002aa000):struct {}{}}, doneChan:(chan struct {})(nil), onShutdown:[]func(){(func())(0x7337f0)}}).WithValue(&http.contextKey{name:\"local-addr\"}, &net.TCPAddr{IP:net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xa, 0xe9, 0x42, 0x4d}, Port:7070, Zone:\"\"}).WithCancel.WithCancel] Body [{RequestUID:in:http-sid:broadcast-channel-grpc:-1-h1:7070-267164731 XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}]"
time="2018-09-26T17:29:29Z" level=info msg="Starting broadcast to [2] downstream services"
time="2018-09-26T17:29:29Z" level=info msg="Making request to [147.75.70.161:4340 / bb-terminus2.lifecycle1]"
time="2018-09-26T17:29:29Z" level=info msg="Making request to [147.75.70.161:4340 / bb-terminus1.lifecycle1]"
time="2018-09-26T17:29:29Z" level=error msg="Error when broadcasting request [requestUID:\"in:http-sid:broadcast-channel-grpc:-1-h1:7070-267164731\" ] to client [147.75.70.161:4340 / bb-terminus1.lifecycle1]: rpc error: code = Internal desc = transport: received the unexpected content-type \"text/plain\""
time="2018-09-26T17:29:39Z" level=error msg="Error when broadcasting request [requestUID:\"in:http-sid:broadcast-channel-grpc:-1-h1:7070-267164731\" ] to client [147.75.70.161:4340 / bb-terminus2.lifecycle1]: rpc error: code = Canceled desc = context canceled"
time="2018-09-26T17:29:39Z" level=info msg="Finished broadcast"
time="2018-09-26T17:29:39Z" level=error msg="Error while handling HTTP request: error handling http request: downstream server [147.75.70.161:4340 / bb-terminus1.lifecycle1] returned error: rpc error: code = Internal desc = transport: received the unexpected content-type \"text/plain\",downstream server [147.75.70.161:4340 / bb-terminus2.lifecycle1] returned error: rpc error: code = Canceled desc = context canceled"
time="2018-09-26T17:29:39Z" level=info msg="Received request with empty body, assigning new request UID [in:http-sid:broadcast-channel-grpc:-1-h1:7070-286270779] to it"
time="2018-09-26T17:29:39Z" level=debug msg="Received HTTP request [in:http-sid:broadcast-channel-grpc:-1-h1:7070-286270779] [&{Method:GET URL:/ Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[L5d-Dst-Client:[/%/io.l5d.k8s.localnode/10.233.66.68/#/io.l5d.k8s/lifecycle1/http/bb-p2p] Content-Length:[0] L5d-Reqid:[f97a23c7f3f8ee43] Via:[1.1 linkerd, 1.1 linkerd] Sc-Req-Id:[1252] L5d-Dst-Service:[/svc/bb-p2p.lifecycle1] L5d-Ctx-Trace:[iY5FjeUemYzsFUw9QNIhePl6I8fz+O5DAAAAAAAAAAA=] User-Agent:[Go-http-client/1.1]] Body:{} GetBody:<nil> ContentLength:0 TransferEncoding:[] Close:false Host:bb-p2p.lifecycle1 Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr:10.233.66.68:55272 RequestURI:/ TLS:<nil> Cancel:<nil> Response:<nil> ctx:0xc0003dfb00}] Context [context.Background.WithValue(&http.contextKey{name:\"http-server\"}, &http.Server{Addr:\":7070\", Handler:(*protocols.httpHandler)(0xc00000e0a8), TLSConfig:(*tls.Config)(0xc0001d0000), ReadTimeout:0, ReadHeaderTimeout:0, WriteTimeout:0, IdleTimeout:0, MaxHeaderBytes:0, TLSNextProto:map[string]func(*http.Server, *tls.Conn, http.Handler){\"h2\":(func(*http.Server, *tls.Conn, http.Handler))(0x72a820)}, ConnState:(func(net.Conn, http.ConnState))(nil), ErrorLog:(*log.Logger)(nil), disableKeepAlives:0, inShutdown:0, nextProtoOnce:sync.Once{m:sync.Mutex{state:0, sema:0x0}, done:0x1}, nextProtoErr:error(nil), mu:sync.Mutex{state:0, sema:0x0}, listeners:map[*net.Listener]struct {}{(*net.Listener)(0xc0001a4080):struct {}{}}, activeConn:map[*http.conn]struct {}{(*http.conn)(0xc000712be0):struct {}{}}, doneChan:(chan struct {})(nil), onShutdown:[]func(){(func())(0x7337f0)}}).WithValue(&http.contextKey{name:\"local-addr\"}, &net.TCPAddr{IP:net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xa, 0xe9, 0x42, 0x4d}, Port:7070, Zone:\"\"}).WithCancel.WithCancel] Body [{RequestUID:in:http-sid:broadcast-channel-grpc:-1-h1:7070-286270779 XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}]"
time="2018-09-26T17:29:39Z" level=info msg="Starting broadcast to [2] downstream services"
time="2018-09-26T17:29:39Z" level=info msg="Making request to [147.75.70.161:4340 / bb-terminus2.lifecycle1]"
time="2018-09-26T17:29:39Z" level=info msg="Making request to [147.75.70.161:4340 / bb-terminus1.lifecycle1]"
time="2018-09-26T17:29:39Z" level=error msg="Error when broadcasting request [requestUID:\"in:http-sid:broadcast-channel-grpc:-1-h1:7070-286270779\" ] to client [147.75.70.161:4340 / bb-terminus1.lifecycle1]: rpc error: code = Internal desc = transport: received the unexpected content-type \"text/plain\""
time="2018-09-26T17:29:49Z" level=error msg="Error when broadcasting request [requestUID:\"in:http-sid:broadcast-channel-grpc:-1-h1:7070-286270779\" ] to client [147.75.70.161:4340 / bb-terminus2.lifecycle1]: rpc error: code = Canceled desc = context canceled"
time="2018-09-26T17:29:49Z" level=info msg="Finished broadcast"
time="2018-09-26T17:29:49Z" level=error msg="Error while handling HTTP request: error handling http request: downstream server [147.75.70.161:4340 / bb-terminus1.lifecycle1] returned error: rpc error: code = Internal desc = transport: received the unexpected content-type \"text/plain\",downstream server [147.75.70.161:4340 / bb-terminus2.lifecycle1] returned error: rpc error: code = Canceled desc = context canceled"

redeployer (pod deletion) log

2018-09-26T17:27:56.642975698Z sleeping for 30 seconds...
2018-09-26T17:28:26.83007554Z found 1 running pods
2018-09-26T17:28:27.076606954Z pod "bb-terminus-78897bc464-z8h48" deleted
2018-09-26T17:28:27.083966334Z sleeping for 30 seconds...
2018-09-26T17:28:57.300890609Z found 1 running pods
2018-09-26T17:28:57.523259717Z pod "bb-terminus-78897bc464-687nt" deleted
2018-09-26T17:28:57.529930851Z sleeping for 30 seconds...
2018-09-26T17:29:27.860318854Z found 1 running pods
2018-09-26T17:29:28.113539307Z pod "bb-terminus-78897bc464-q4wxz" deleted
2018-09-26T17:29:28.12074857Z sleeping for 30 seconds...
2018-09-26T17:29:58.294564828Z found 1 running pods
2018-09-26T17:29:58.535772668Z pod "bb-terminus-78897bc464-vblj2" deleted
2018-09-26T17:29:58.543236254Z sleeping for 30 seconds...

Pod deletion command: https://github.com/linkerd/linkerd-examples/blob/ed076ee1cc378de5b3823d1efaeb86c08352d3b4/lifecycle/linkerd1/lifecycle.yml#L398

Possibly related to #2079

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:1
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
siggycommented, Oct 4, 2018

@zoltrain quick update we are continuing to dig into this and have some promising leads, stay tuned.

1reaction
zoltraincommented, Oct 1, 2018

No problem.

@siggy I’ve done some more digging today. I figured “maybe this has been introduced recently”. So I’ve been retroactively testing the minor release changes, using the MaxConnectionAge setup I had on Friday. Good news, it fails consistently, and I also found that something may have been introduced between version 1.4.3 and 1.4.4 that causes this issue. I tested 1.4.4 and it has the failures after around the 10 min mark, I’m currently running 1.4.3, it’s been running for 30 minutes without failure. I dug into the commit history between those releases and there was a lot of work done on H2 streams/connections. Finagle was upgraded, a stream buffer was replaced, and a status code guard for resets was removed.

If I had to guess, I’d say one of these might be causing this issue. https://github.com/linkerd/linkerd/commit/9876e3da13f1ab29365926e8455c334106397256 https://github.com/linkerd/linkerd/commit/ac64c5991df2d008c4e6855982273eca4e63f51c https://github.com/linkerd/linkerd/commit/62aa66e5d0c6abec77f6289d5d9249a102928397

I’m no Scala expert, so can’t really comment much on what’s going on in the above. I’ll leave that to the experts.

I’m going to run 1.4.3 continuously this afternoon to make sure it doesn’t hit that same barriers eventually. If not we’ll be looking at downgrading the release to hopefully stop this while the source of it is tracked down.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting | Linkerd
Ensure that your system is configured to connect to a Kubernetes cluster. ... Check out the logs and delete the pod to flush...
Read more >
Graceful Pod Shutdown | Linkerd
By default, Kubernetes will wait 30 seconds to allow processes to handle the TERM signal. This is known as the grace period within...
Read more >
Linkerd stops sending traffic to grpc kubernetes pods
Hi,. I have been seen this behavior multiple times now. I am running linkerd:1.3.4 . The following is the full set of configuration:...
Read more >
Uninstalling Linkerd
Removing Linkerd from a Kubernetes cluster requires a few steps: removing any data plane proxies, removing all the extensions and then removing the...
Read more >
How we designed retries in Linkerd 2.2
Once the timeout is reached, Linkerd will cancel the request and return a ... window and a Kubernetes cluster, you can follow along...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found