question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

HTTP 504 Gateway timeouts after upgrading to linkerd 1.2.0

See original GitHub issue

Filing a linkerd issue

Issue Type: Bug Report

What happened: We have a rails application which serves web pages and it was running as a Kubernetes deployment. Every time a deployment is made and new pods come up, Linkerd gives 504 gateway timeout and when I checked Linkerd logs I can see that it was still making requests to the old endpoints. (Not sure if this is a config issue). It get’s fixed on its own after some time.

What you expected to happen: Endpoints should be instantly updated whenever there is a deployment.

How to reproduce it (as minimally and precisely as possible): Rails Puma Server serving web pages and making requests to the service just after a deployment.

Anything else we need to know?: Our router config:

    - protocol: http
      label: webapp-external
      identifier:
        kind: io.l5d.header.token
        header: Host
      interpreter:
        kind: io.l5d.namerd
        dst: /$/inet/namerd.linkerd.svc.cluster.local/4100
        namespace: external
        transformers:
        - kind: io.l5d.k8s.daemonset
          namespace: linkerd
          port: webapp-ingress
          service: linkerd-internal
      servers:
      - port: 4143
        ip: 0.0.0.0
      client:
        kind: io.l5d.global
        loadBalancer:
          kind: ewma
          enableProbation: false
          maxEffort: 5
          decayTimeMs: 10
        failureAccrual:
          kind: io.l5d.consecutiveFailures
          failures: 5
      service:
        kind: io.l5d.global

    - protocol: http
      label: webapp-ingress
      identifier:
        kind: io.l5d.header.token
        header: Host
      interpreter:
        kind: io.l5d.namerd
        dst: /$/inet/namerd.linkerd.svc.cluster.local/4100
        namespace: external
        transformers:
        - kind: io.l5d.k8s.localnode
      servers:
      - port: 4145
        ip: 0.0.0.0
      client:
        kind: io.l5d.global
        loadBalancer:
          kind: ewma
          enableProbation: false
          maxEffort: 5
          decayTimeMs: 10
        failureAccrual:
          kind: io.l5d.consecutiveFailures
          failures: 5
      service:
        kind: io.l5d.global

Environment:

  • linkerd/namerd version, config files: 1.2.0/1.2.0
  • Platform, version, and config files (Kubernetes, DC/OS, etc): Kubernetes
  • Cloud provider or hardware configuration: AWS

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:50 (25 by maintainers)

github_iconTop GitHub Comments

4reactions
hawkwcommented, Sep 20, 2017

@bseibel I’ve been looking into this some more today and I agree that this issue is almost certainly related to kubernetes/kubernetes#35068. That also explains why our unit tests haven’t caught this issue, as the tests for handling the “too old resource version” response set the response status code to 410.

2reactions
bseibelcommented, Oct 3, 2017

So unfortunately we’re still seeing this issue even with the fix here, we now see in debug logs

D 1003 19:20:25.158 UTC THREAD51 TraceId:9921f7129139749d: k8s returned 'too old resource version' error with incorrect HTTP status code, restarting watch

however where we do see lines like (and pardon my slightly filtered log lines without the endpoints):

E  D 1003 19:51:13.636 UTC THREAD65 TraceId:8b3991f3b0f04b0d: k8s ns default svc yarisgrmn constructed new ServiceEndpoints with:
E  D 1003 19:51:13.636 UTC THREAD65 TraceId:8b3991f3b0f04b0d: k8s ns default service yarisgrmn added port mappings
E  D 1003 19:51:13.636 UTC THREAD65 TraceId:8b3991f3b0f04b0d: k8s ns default service yarisgrmn added endpoints

for most pre-existing endpoints which is fine, and expected, but a service that was added after the restarting watch line doesn’t appear in the logs at all, and linkerd ends up giving us “No hosts are available”.

So far we’ve only seen this happen in production, but it happens pretty frequently, sometimes minutes after we kick our namerd pods. Linkerd isnt logging any issues about connectivity to namerd. I’m out of town at the moment and I’m going to try to narrow down the issue further when I’m back later this week, but if theres anything specific you would like me to poke at to help narrow down the issue please let me know.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Fix the 504 Gateway Timeout Error on Your Site - Kinsta
The 504 (Gateway Timeout) status code indicates that the server, while acting as a gateway or proxy, did not receive a timely response...
Read more >
What is a 504 Gateway Timeout error, and how to fix it?
The 504 (Gateway Timeout) status code indicates that the server while acting as a gateway or proxy, did not receive a timely response...
Read more >
php-fpm memory issues / 504 Gateway Time-out - Linode
I was on a Linode 768 plan. I upgraded to the 1024 product to take into consideration I was perhaps trying to do...
Read more >
504 gateway time-out error when saving a category with 1k+ ...
This article suggests a solution for the timeout issue you might have, when performing operations with large categories (1k+ plus products).
Read more >
12 Quick Ways to Fix HTTP 504 Gateway Timeout Error Code
What does 504 gateway timeout mean? 504 Gateway timeout error is an HTTP status code. It appears when one server does not receive...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found