linkerd 1.3.3 issue with gRPC
See original GitHub issueIssue Type:
- Bug report
- Feature request
What happened:
I updated linkerd from 1.1.2
to 1.3.3
in DCOS 1.9 and started getting the following errors in gRPC clients:
from go client:
2017/12/05 17:30:05 rpc error: code = ResourceExhausted desc = grpc: received message larger than max (845559858 vs. 4194304)
from .net core client:
Unhandled Exception: Grpc.Core.RpcException: Status(StatusCode=Internal, Detail="Failed to deserialize response message.")
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Grpc.Core.Internal.AsyncCall`2.UnaryCall(TRequest msg)
at Grpc.Core.DefaultCallInvoker.BlockingUnaryCall[TRequest,TResponse](Method`2 method, String host, CallOptions options, TRequest request)
at Grpc.Core.Internal.InterceptingCallInvoker.BlockingUnaryCall[TRequest,TResponse](Method`2 method, String host, CallOptions options, TRequest request)
It looks like the message becomes invalid. The errors don’t happen all the time, I need to call that endpoint multiple times. The message itself is not that big - around 40KB
.
What you expected to happen:
How to reproduce it (as minimally and precisely as possible): I tried running linkerd locally, but couldn’t reproduce this issue.
Anything else we need to know?:
Environment:
- linkerd/namerd version, config files: linkerd 1.3.3, no namerd
- Platform, version, and config files (Kubernetes, DC/OS, etc): DCOS Enterprise 1.9
- Cloud provider or hardware configuration: AWS
linkerd configuration:
---
usage:
enabled: false
admin:
port: 9990
ip: 0.0.0.0
namers:
- kind: io.l5d.marathon
host: leader.mesos
port: 443
uriPrefix: "/marathon"
prefix: "/io.l5d.marathon"
tls:
commonName: master.mesos
trustCerts:
- "/mnt/mesos/sandbox/.ssl/ca.crt"
routers:
- protocol: h2
experimental: true
identifier:
kind: io.l5d.header.token
dtab: "/ph=>/$/io.buoyant.rinet;/srv=>/$/io.buoyant.porthostPfx/ph;/svc=>/srv;/marathonId=>/#/io.l5d.marathon;/svc=>/$/io.buoyant.http.domainToPathPfx/marathonId"
servers:
- port: 4140
ip: 0.0.0.0
Issue Analytics
- State:
- Created 6 years ago
- Comments:14 (7 by maintainers)
Top Results From Across the Web
Linkerd stops sending traffic to grpc kubernetes pods
Hi,. I have been seen this behavior multiple times now. I am running linkerd:1.3.4 . The following is the full set of configuration:...
Read more >Unable to make gRPC calls via Linkerd - Help
I am having a hard time getting k8s intra-cluster gRPC calls to work. I have setup Linkerd as a DaemonSet using the servicemesh.yaml...
Read more >RST_STREAM errors with gRPC - Kubernetes - Linkerd
I have setup linkerd-1.3.1 with kubernetes-1.6.6, with linkerd daemonset running for gRPC between node.js microservices. I am having issues ...
Read more >Unable to configure Linkerd as gRPC load balancer - Linkerd2
I am trying to configure Linkerd as load balancer for my gRPC client-server communication. All servers and the client are running into Kubernetes...
Read more >Using Linkerd as a Service Mesh Proxy at WePay
Our meetup session about gRPC talks a bit about using this pattern for ... the following issues have proven to be very important...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@wmorgan I was able to troubleshoot it a bit further. I added some logs to gRPC client to get http2 framing info:
I call the same endpoint with the same parameters several times (unary call). Stream
59
is a successful one. Stream61
is when the error happens. Basically when it reads http2 data frame it should be length-prefixed (at least this is what gRPC expects), so gRPC reads first several bytes to get the length of the message, but this message from stream61
doesn’t have length header, so gRPC reads a few bytes of the message instead and throws an error.Are there any logs I can get you from linkerd? I still cannot reproduce the issue locally.
I tried to reproduce it in my local cluster, but unfortunately no luck. But I can steadily reproduce it two AWS DCOS clusters, one is 1.9, another is 1.10.
Considering the issue started happening between
1.3.2
and1.3.3
, I tried to find a specific commit. Here is what I’ve got. I’ve build two linkerd docker images, one from commit https://github.com/linkerd/linkerd/commit/c6f0d2eaeecca80c60314e6f6cb852a31870877a, another from commit https://github.com/linkerd/linkerd/commit/0bd8a91ed51fecc34b86f110382e2077a7b88600. The issue is happening with the first one, but not the second one. It looks like https://github.com/linkerd/linkerd/commit/c6f0d2eaeecca80c60314e6f6cb852a31870877a contributes to the issue somehow.