question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

linkerd 1.3.3 issue with gRPC

See original GitHub issue

Issue Type:

  • Bug report
  • Feature request

What happened: I updated linkerd from 1.1.2 to 1.3.3 in DCOS 1.9 and started getting the following errors in gRPC clients:

from go client:

2017/12/05 17:30:05 rpc error: code = ResourceExhausted desc = grpc: received message larger than max (845559858 vs. 4194304)

from .net core client:

Unhandled Exception: Grpc.Core.RpcException: Status(StatusCode=Internal, Detail="Failed to deserialize response message.")
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Grpc.Core.Internal.AsyncCall`2.UnaryCall(TRequest msg)
   at Grpc.Core.DefaultCallInvoker.BlockingUnaryCall[TRequest,TResponse](Method`2 method, String host, CallOptions options, TRequest request)
   at Grpc.Core.Internal.InterceptingCallInvoker.BlockingUnaryCall[TRequest,TResponse](Method`2 method, String host, CallOptions options, TRequest request)

It looks like the message becomes invalid. The errors don’t happen all the time, I need to call that endpoint multiple times. The message itself is not that big - around 40KB.

What you expected to happen:

How to reproduce it (as minimally and precisely as possible): I tried running linkerd locally, but couldn’t reproduce this issue.

Anything else we need to know?:

Environment:

  • linkerd/namerd version, config files: linkerd 1.3.3, no namerd
  • Platform, version, and config files (Kubernetes, DC/OS, etc): DCOS Enterprise 1.9
  • Cloud provider or hardware configuration: AWS

linkerd configuration:

---
usage:
  enabled: false
admin:
  port: 9990
  ip: 0.0.0.0
namers:
- kind: io.l5d.marathon
  host: leader.mesos
  port: 443
  uriPrefix: "/marathon"
  prefix: "/io.l5d.marathon"
  tls:
    commonName: master.mesos
    trustCerts:
    - "/mnt/mesos/sandbox/.ssl/ca.crt"
routers:
- protocol: h2
  experimental: true
  identifier:
    kind: io.l5d.header.token
  dtab: "/ph=>/$/io.buoyant.rinet;/srv=>/$/io.buoyant.porthostPfx/ph;/svc=>/srv;/marathonId=>/#/io.l5d.marathon;/svc=>/$/io.buoyant.http.domainToPathPfx/marathonId"
  servers:
  - port: 4140
    ip: 0.0.0.0

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:14 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
vadimicommented, Dec 13, 2017

@wmorgan I was able to troubleshoot it a bit further. I added some logs to gRPC client to get http2 framing info:

2017/12/13 15:24:23 [FrameHeader HEADERS flags=END_HEADERS|PRIORITY stream=59 len=10]
2017/12/13 15:24:23 [FrameHeader DATA stream=59 len=16384]
2017/12/13 15:24:23 [FrameHeader DATA stream=59 len=16384]
2017/12/13 15:24:23 [FrameHeader DATA stream=59 len=16384]
2017/12/13 15:24:23 [FrameHeader DATA stream=59 len=1761]
2017/12/13 15:24:23 [FrameHeader HEADERS flags=END_STREAM|END_HEADERS|PRIORITY stream=59 len=8]
2017/12/13 15:24:23 [FrameHeader HEADERS flags=END_HEADERS|PRIORITY stream=61 len=10]
2017/12/13 15:24:23 [FrameHeader DATA stream=61 len=16384]
2017/12/13 15:24:23 rpc error: code = ResourceExhausted desc = grpc: received message larger than max (845559858 vs. 4194304)

I call the same endpoint with the same parameters several times (unary call). Stream 59 is a successful one. Stream 61 is when the error happens. Basically when it reads http2 data frame it should be length-prefixed (at least this is what gRPC expects), so gRPC reads first several bytes to get the length of the message, but this message from stream 61 doesn’t have length header, so gRPC reads a few bytes of the message instead and throws an error.

Are there any logs I can get you from linkerd? I still cannot reproduce the issue locally.

1reaction
vadimicommented, Dec 8, 2017

I tried to reproduce it in my local cluster, but unfortunately no luck. But I can steadily reproduce it two AWS DCOS clusters, one is 1.9, another is 1.10.

Considering the issue started happening between 1.3.2 and 1.3.3, I tried to find a specific commit. Here is what I’ve got. I’ve build two linkerd docker images, one from commit https://github.com/linkerd/linkerd/commit/c6f0d2eaeecca80c60314e6f6cb852a31870877a, another from commit https://github.com/linkerd/linkerd/commit/0bd8a91ed51fecc34b86f110382e2077a7b88600. The issue is happening with the first one, but not the second one. It looks like https://github.com/linkerd/linkerd/commit/c6f0d2eaeecca80c60314e6f6cb852a31870877a contributes to the issue somehow.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Linkerd stops sending traffic to grpc kubernetes pods
Hi,. I have been seen this behavior multiple times now. I am running linkerd:1.3.4 . The following is the full set of configuration:...
Read more >
Unable to make gRPC calls via Linkerd - Help
I am having a hard time getting k8s intra-cluster gRPC calls to work. I have setup Linkerd as a DaemonSet using the servicemesh.yaml...
Read more >
RST_STREAM errors with gRPC - Kubernetes - Linkerd
I have setup linkerd-1.3.1 with kubernetes-1.6.6, with linkerd daemonset running for gRPC between node.js microservices. I am having issues ...
Read more >
Unable to configure Linkerd as gRPC load balancer - Linkerd2
I am trying to configure Linkerd as load balancer for my gRPC client-server communication. All servers and the client are running into Kubernetes...
Read more >
Using Linkerd as a Service Mesh Proxy at WePay
Our meetup session about gRPC talks a bit about using this pattern for ... the following issues have proven to be very important...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found