Support for draining node during pod termination
See original GitHub issuePlease describe your use case / problem.
We have a legacy very-stateful application (tomcat/java), which needs sticky sessions. When we deploy new versions of our applications, or a scale down event happens, we need to stop sending new connections to a server, while sending bound sessions to the old server. Please note: this is not referring to in-flight requests, we’re needing the active tomcat sessions to expire, which normally takes about an hour.
We have this working currently in Kubernetes with HAProxy-Ingress by setting its drain-support
flag, which drains a pod when it transitions to Terminating
, but keeps existing sessions attached to that pod. We then have a preStop
hook which blocks the shutdown until users have finished their sessions.
When looking at Ambassador to fulfill this ingress role, we see the pod being removed from routing as soon as it transitions to Terminating. This was tested with a simple application returning the hostname of the pod:
- Getting two sessions against different pods - Pod A and Pod B
- Scale the deployment down to 1 - Pod A moved to Terminating, Pod B still active
- Session immediately switched over to the remaining pod - that is, Pod A session is dropped
Describe the solution you’d like
We’d love for ambassador to drain a pod when it is in Terminating, and keep existing connections alive, whilst routing new connections to other active Pods.
Additional context
Looking at the Kubernetes docs, it looks like they are dropping the pods from the Service registry, as soon as they pod moves to terminating. As per: https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods
3. Pod shows up as “Terminating” when listed in client commands
4. (simultaneous with 3) When the Kubelet sees that a Pod has been marked as terminating because the time in 2 has been set, it begins the pod shutdown process.
...
5. (simultaneous with 3) Pod is removed from endpoints list for service, and are no longer considered part of the set of running pods for replication controllers. Pods that shutdown slowly cannot continue to serve traffic as load balancers (like the service proxy) remove them from their rotations.
Test Details
Ambassador version: 0.60.1
Ambassador Service Configuration
getambassador.io/config: |
---
apiVersion: ambassador/v1
kind: KubernetesEndpointResolver
name: my-resolver
---
apiVersion: ambassador/v1
kind: Module
name: ambassador
config:
resolver: my-resolver
load_balancer:
policy: round_robin
Ambassador Target Service Configuration:
apiVersion: v1
kind: Service
metadata:
name: sticky
labels:
app: sticky
annotations:
getambassador.io/config: |
---
apiVersion: ambassador/v1
kind: Mapping
name: sticky_mapping
prefix: /sticky/
service: sticky
resolver: my-resolver
load_balancer:
policy: maglev
cookie:
name: sticky-cookie
ttl: 300s
spec:
ports:
- name: http
port: 80
selector:
app: sticky
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: sticky
spec:
replicas: 2
selector:
matchLabels:
app: sticky
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"
labels:
app: sticky
spec:
terminationGracePeriodSeconds: 120
containers:
- name: hello
image: nginxdemos/hello
lifecycle:
preStop:
exec:
command:
- /bin/sleep
- '60'
ports:
- containerPort: 80
Issue Analytics
- State:
- Created 4 years ago
- Reactions:14
- Comments:34 (14 by maintainers)
Top GitHub Comments
Hi, any updates if this is supported on the non-legacy versions?
If not, curious to understand if switching to LEGACY mode will have any consequences.
@rbtcollins Thanks for this fix! I tested it in one of our staging environments and it’s working (no 5XXs during pod restarts/termination) when AMBASSADOR_LEGACY_MODE=true is set.
Unfortunately, I was mistaken and we are not using the AMBASSADOR_LEGACY_MODE in production today. Do you know what would be the downside of move back to the AMBASSADOR_LEGACY_MODE vs the default one (we are using v1.14.2)?