question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support for draining node during pod termination

See original GitHub issue

Please describe your use case / problem.

We have a legacy very-stateful application (tomcat/java), which needs sticky sessions. When we deploy new versions of our applications, or a scale down event happens, we need to stop sending new connections to a server, while sending bound sessions to the old server. Please note: this is not referring to in-flight requests, we’re needing the active tomcat sessions to expire, which normally takes about an hour.

We have this working currently in Kubernetes with HAProxy-Ingress by setting its drain-support flag, which drains a pod when it transitions to Terminating, but keeps existing sessions attached to that pod. We then have a preStop hook which blocks the shutdown until users have finished their sessions.

When looking at Ambassador to fulfill this ingress role, we see the pod being removed from routing as soon as it transitions to Terminating. This was tested with a simple application returning the hostname of the pod:

  • Getting two sessions against different pods - Pod A and Pod B
  • Scale the deployment down to 1 - Pod A moved to Terminating, Pod B still active
  • Session immediately switched over to the remaining pod - that is, Pod A session is dropped

Describe the solution you’d like

We’d love for ambassador to drain a pod when it is in Terminating, and keep existing connections alive, whilst routing new connections to other active Pods.

Additional context

Looking at the Kubernetes docs, it looks like they are dropping the pods from the Service registry, as soon as they pod moves to terminating. As per: https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods

3. Pod shows up as “Terminating” when listed in client commands

4. (simultaneous with 3) When the Kubelet sees that a Pod has been marked as terminating because the time in 2 has been set, it begins the pod shutdown process.
...
5. (simultaneous with 3) Pod is removed from endpoints list for service, and are no longer considered part of the set of running pods for replication controllers. Pods that shutdown slowly cannot continue to serve traffic as load balancers (like the service proxy) remove them from their rotations.

Test Details

Ambassador version: 0.60.1

Ambassador Service Configuration

getambassador.io/config: |
      ---
      apiVersion: ambassador/v1
      kind: KubernetesEndpointResolver
      name: my-resolver
      ---
      apiVersion: ambassador/v1
      kind:  Module
      name:  ambassador
      config:
        resolver: my-resolver
        load_balancer:
          policy: round_robin

Ambassador Target Service Configuration:

apiVersion: v1
kind: Service
metadata:
  name: sticky
  labels:
    app: sticky
  annotations:
    getambassador.io/config: |
      ---
      apiVersion: ambassador/v1
      kind:  Mapping
      name:  sticky_mapping
      prefix: /sticky/
      service: sticky
      resolver: my-resolver
      load_balancer:
        policy: maglev
        cookie:
          name: sticky-cookie
          ttl: 300s
spec:
  ports:
  - name: http
    port: 80
  selector:
    app: sticky
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sticky
spec:
  replicas: 2
  selector:
    matchLabels:
      app: sticky
  template:
    metadata:
      annotations:
        sidecar.istio.io/inject: "false"
      labels:
        app: sticky
    spec:
      terminationGracePeriodSeconds: 120
      containers:
      - name: hello
        image: nginxdemos/hello
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/sleep
              - '60'
        ports:
        - containerPort: 80

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:14
  • Comments:34 (14 by maintainers)

github_iconTop GitHub Comments

1reaction
mohitreddy1996commented, May 31, 2022

Hi, any updates if this is supported on the non-legacy versions?

If not, curious to understand if switching to LEGACY mode will have any consequences.

1reaction
marianafrancocommented, Dec 5, 2021

@rbtcollins Thanks for this fix! I tested it in one of our staging environments and it’s working (no 5XXs during pod restarts/termination) when AMBASSADOR_LEGACY_MODE=true is set.

Unfortunately, I was mistaken and we are not using the AMBASSADOR_LEGACY_MODE in production today. Do you know what would be the downside of move back to the AMBASSADOR_LEGACY_MODE vs the default one (we are using v1.14.2)?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Safely Drain a Node - Kubernetes
This page shows how to safely drain a node, optionally respecting the PodDisruptionBudget you have defined. Before you begin Your Kubernetes ...
Read more >
Support for draining node during pod termination · Issue #1473
We have this working currently in Kubernetes with HAProxy-Ingress by setting its drain-support flag, which drains a pod when it transitions to ...
Read more >
Draining of Nodes for Kubernetes - Ripon Banik - Medium
Cordon and Drain the Nodes before it gets terminated by ASG in AWS ... Kubernetes Node Drainer helps to evict pods from nodes...
Read more >
Safely Drain a Node while Respecting Application SLOs
Use kubectl drain to remove a node from service ... You can use kubectl drain to safely evict all of your pods from...
Read more >
Drain Kubernetes Nodes... Wisely - Percona
Anyone who ever worked with containers knows how ephemeral they are. In Kubernetes, not only can containers and pods be replaced, but the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found