Unhealthy pod after upgrade
See original GitHub issueChart version: 7.9.3, with 7.16.1 image
Kubernetes version:
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"archive", BuildDate:"2021-07-22T00:00:00Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.13", GitCommit:"ddb8dba0a4420ec79de95cebfa4bd7d41aed848a", GitTreeState:"clean", BuildDate:"2021-11-01T16:41:29Z", GoVersion:"go1.15.14", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.21) and server (1.19) exceeds the supported minor version skew of +/-1
Kubernetes provider: Azure
Helm Version:
version.BuildInfo{Version:"v3.4.1", GitCommit:"c4e74854886b2efe3321e185578e6db9be0a6e29", GitTreeState:"clean", GoVersion:"go1.14.11"}
helm get release
output
e.g. helm get elasticsearch
(replace elasticsearch
with the name of your helm release)
Be careful to obfuscate every secrets (credentials, token, public IP, …) that could be visible in the output before copy-pasting.
If you find some secrets in plain text in helm get release
output you should use Kubernetes Secrets to managed them is a secure way (see Security Example).
Output of helm get release
NAME: testbed-elasticsearch
LAST DEPLOYED: Wed Dec 15 13:25:30 2021
NAMESPACE: testbed-elasticsearch
STATUS: deployed
REVISION: 4
USER-SUPPLIED VALUES:
antiAffinity: hard
antiAffinityTopologyKey: kubernetes.io/hostname
clusterHealthCheckParams: wait_for_status=green&timeout=2s
clusterName: elasticsearch
enableServiceLinks: true
envFrom: []
esConfig:
elasticsearch.yml: |
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: /usr/share/elasticsearch/config/certs/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: /usr/share/elasticsearch/config/certs/elastic-certificates.p12
esJavaOpts: -Xmx1g -Xms1g
esMajorVersion: ""
extraContainers: []
extraEnvs:
- name: ELASTIC_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: elastic-credentials
- name: ELASTIC_USERNAME
valueFrom:
secretKeyRef:
key: username
name: elastic-credentials
extraInitContainers: []
extraVolumeMounts: []
extraVolumes: []
fsGroup: ""
fullnameOverride: ""
httpPort: 9200
image: docker.elastic.co/elasticsearch/elasticsearch
imagePullPolicy: IfNotPresent
imagePullSecrets: []
imageTag: 7.16.1
ingress:
annotations:
acme.cert-manager.io/http01-edit-in-place: "true"
cert-manager.io/cluster-issuer: letsencrypt-prod
ingress.kubernetes.io/ssl-redirect: "true"
kubernetes.io/ingress.class: nginx
enabled: false
hosts:
- testbed-example.com
path: /
tls:
- hosts:
- testbed-example.com
secretName: testbed-es-tls-secret
initResources: {}
keystore: []
labels: {}
lifecycle: {}
masterService: ""
masterTerminationFix: false
maxUnavailable: 1
minimumMasterNodes: 2
nameOverride: ""
networkHost: 0.0.0.0
nodeAffinity: {}
nodeGroup: master
nodeSelector: {}
persistence:
annotations: {}
enabled: true
labels:
enabled: true
podAnnotations: {}
podManagementPolicy: Parallel
podSecurityContext:
fsGroup: 1000
runAsUser: 1000
podSecurityPolicy:
create: false
name: ""
spec:
fsGroup:
rule: RunAsAny
privileged: true
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- secret
- configMap
- persistentVolumeClaim
priorityClassName: ""
protocol: http
rbac:
create: false
serviceAccountAnnotations: {}
serviceAccountName: ""
readinessProbe:
failureThreshold: 4
initialDelaySeconds: 20
periodSeconds: 20
successThreshold: 3
timeoutSeconds: 10
replicas: 3
resources:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 1000m
memory: 2Gi
roles:
data: "true"
ingest: "true"
master: "true"
remote_cluster_client: "true"
schedulerName: ""
secretMounts:
- name: elastic-certificates
path: /usr/share/elasticsearch/config/certs
secretName: elastic-certificates
securityContext:
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1000
service:
annotations: {}
externalTrafficPolicy: ""
httpPortName: http
labels: {}
labelsHeadless: {}
loadBalancerIP: ""
loadBalancerSourceRanges: []
nodePort: ""
transportPortName: transport
type: ClusterIP
sidecarResources: {}
sysctlInitContainer:
enabled: true
sysctlVmMaxMapCount: 262144
terminationGracePeriod: 120
tolerations: []
transportPort: 9300
updateStrategy: RollingUpdate
volumeClaimTemplate:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 30Gi
COMPUTED VALUES:
antiAffinity: hard
antiAffinityTopologyKey: kubernetes.io/hostname
clusterHealthCheckParams: wait_for_status=green&timeout=2s
clusterName: elasticsearch
enableServiceLinks: true
envFrom: []
esConfig:
elasticsearch.yml: |
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: /usr/share/elasticsearch/config/certs/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: /usr/share/elasticsearch/config/certs/elastic-certificates.p12
esJavaOpts: -Xmx1g -Xms1g
esMajorVersion: ""
extraContainers: []
extraEnvs:
- name: ELASTIC_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: elastic-credentials
- name: ELASTIC_USERNAME
valueFrom:
secretKeyRef:
key: username
name: elastic-credentials
extraInitContainers: []
extraVolumeMounts: []
extraVolumes: []
fsGroup: ""
fullnameOverride: ""
httpPort: 9200
image: docker.elastic.co/elasticsearch/elasticsearch
imagePullPolicy: IfNotPresent
imagePullSecrets: []
imageTag: 7.16.1
ingress:
annotations:
acme.cert-manager.io/http01-edit-in-place: "true"
cert-manager.io/cluster-issuer: letsencrypt-prod
ingress.kubernetes.io/ssl-redirect: "true"
kubernetes.io/ingress.class: nginx
enabled: false
hosts:
- testbed-example.com
path: /
tls:
- hosts:
- testbed-example.com
secretName: testbed-es-tls-secret
initResources: {}
keystore: []
labels: {}
lifecycle: {}
masterService: ""
masterTerminationFix: false
maxUnavailable: 1
minimumMasterNodes: 2
nameOverride: ""
networkHost: 0.0.0.0
nodeAffinity: {}
nodeGroup: master
nodeSelector: {}
persistence:
annotations: {}
enabled: true
labels:
enabled: true
podAnnotations: {}
podManagementPolicy: Parallel
podSecurityContext:
fsGroup: 1000
runAsUser: 1000
podSecurityPolicy:
create: false
name: ""
spec:
fsGroup:
rule: RunAsAny
privileged: true
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- secret
- configMap
- persistentVolumeClaim
priorityClassName: ""
protocol: http
rbac:
create: false
serviceAccountAnnotations: {}
serviceAccountName: ""
readinessProbe:
failureThreshold: 4
initialDelaySeconds: 20
periodSeconds: 20
successThreshold: 3
timeoutSeconds: 10
replicas: 3
resources:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 1000m
memory: 2Gi
roles:
data: "true"
ingest: "true"
master: "true"
remote_cluster_client: "true"
schedulerName: ""
secretMounts:
- name: elastic-certificates
path: /usr/share/elasticsearch/config/certs
secretName: elastic-certificates
securityContext:
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1000
service:
annotations: {}
externalTrafficPolicy: ""
httpPortName: http
labels: {}
labelsHeadless: {}
loadBalancerIP: ""
loadBalancerSourceRanges: []
nodePort: ""
transportPortName: transport
type: ClusterIP
sidecarResources: {}
sysctlInitContainer:
enabled: true
sysctlVmMaxMapCount: 262144
terminationGracePeriod: 120
tolerations: []
transportPort: 9300
updateStrategy: RollingUpdate
volumeClaimTemplate:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 30Gi
HOOKS:
---
# Source: elasticsearch/templates/test/test-elasticsearch-health.yaml
apiVersion: v1
kind: Pod
metadata:
name: "testbed-elasticsearch-cmrxm-test"
annotations:
"helm.sh/hook": test-success
spec:
securityContext:
fsGroup: 1000
runAsUser: 1000
containers:
- name: "testbed-elasticsearch-fussv-test"
image: "docker.elastic.co/elasticsearch/elasticsearch:7.16.1"
imagePullPolicy: "IfNotPresent"
command:
- "sh"
- "-c"
- |
#!/usr/bin/env bash -e
curl -XGET --fail 'elasticsearch-master:9200/_cluster/health?wait_for_status=green&timeout=2s'
restartPolicy: Never
MANIFEST:
---
# Source: elasticsearch/templates/poddisruptionbudget.yaml
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: "elasticsearch-master-pdb"
spec:
maxUnavailable: 1
selector:
matchLabels:
app: "elasticsearch-master"
---
# Source: elasticsearch/templates/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: elasticsearch-master-config
labels:
heritage: "Helm"
release: "testbed-elasticsearch"
chart: "elasticsearch"
app: "elasticsearch-master"
data:
elasticsearch.yml: |
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: /usr/share/elasticsearch/config/certs/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: /usr/share/elasticsearch/config/certs/elastic-certificates.p12
---
# Source: elasticsearch/templates/service.yaml
kind: Service
apiVersion: v1
metadata:
name: elasticsearch-master
labels:
heritage: "Helm"
release: "testbed-elasticsearch"
chart: "elasticsearch"
app: "elasticsearch-master"
annotations:
{}
spec:
type: ClusterIP
selector:
release: "testbed-elasticsearch"
chart: "elasticsearch"
app: "elasticsearch-master"
ports:
- name: http
protocol: TCP
port: 9200
- name: transport
protocol: TCP
port: 9300
---
# Source: elasticsearch/templates/service.yaml
kind: Service
apiVersion: v1
metadata:
name: elasticsearch-master-headless
labels:
heritage: "Helm"
release: "testbed-elasticsearch"
chart: "elasticsearch"
app: "elasticsearch-master"
annotations:
service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
spec:
clusterIP: None # This is needed for statefulset hostnames like elasticsearch-0 to resolve
# Create endpoints also if the related pod isn't ready
publishNotReadyAddresses: true
selector:
app: "elasticsearch-master"
ports:
- name: http
port: 9200
- name: transport
port: 9300
---
# Source: elasticsearch/templates/statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch-master
labels:
heritage: "Helm"
release: "testbed-elasticsearch"
chart: "elasticsearch"
app: "elasticsearch-master"
annotations:
esMajorVersion: "7"
spec:
serviceName: elasticsearch-master-headless
selector:
matchLabels:
app: "elasticsearch-master"
replicas: 3
podManagementPolicy: Parallel
updateStrategy:
type: RollingUpdate
volumeClaimTemplates:
- metadata:
name: elasticsearch-master
labels:
heritage: "Helm"
release: "testbed-elasticsearch"
chart: "elasticsearch"
app: "elasticsearch-master"
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 30Gi
template:
metadata:
name: "elasticsearch-master"
labels:
heritage: "Helm"
release: "testbed-elasticsearch"
chart: "elasticsearch"
app: "elasticsearch-master"
annotations:
configchecksum: 22a782517a55e99c9cbbf8968adf234abfbfc63c77e35116bc894d6820d3404
spec:
securityContext:
fsGroup: 1000
runAsUser: 1000
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- "elasticsearch-master"
topologyKey: kubernetes.io/hostname
terminationGracePeriodSeconds: 120
volumes:
- name: elastic-certificates
secret:
secretName: elastic-certificates
- name: esconfig
configMap:
name: elasticsearch-master-config
enableServiceLinks: true
initContainers:
- name: configure-sysctl
securityContext:
runAsUser: 0
privileged: true
image: "docker.elastic.co/elasticsearch/elasticsearch:7.16.1"
imagePullPolicy: "IfNotPresent"
command: ["sysctl", "-w", "vm.max_map_count=262144"]
resources:
{}
containers:
- name: "elasticsearch"
securityContext:
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1000
image: "docker.elastic.co/elasticsearch/elasticsearch:7.16.1"
imagePullPolicy: "IfNotPresent"
readinessProbe:
exec:
command:
- sh
- -c
- |
#!/usr/bin/env bash -e
# If the node is starting up wait for the cluster to be ready (request params: "wait_for_status=green&timeout=2s" )
# Once it has started only check that the node itself is responding
START_FILE=/tmp/.es_start_file
# Disable nss cache to avoid filling dentry cache when calling curl
# This is required with Elasticsearch Docker using nss < 3.52
export NSS_SDB_USE_CACHE=no
http () {
local path="${1}"
local args="${2}"
set -- -XGET -s
if [ "$args" != "" ]; then
set -- "$@" $args
fi
if [ -n "${ELASTIC_USERNAME}" ] && [ -n "${ELASTIC_PASSWORD}" ]; then
set -- "$@" -u "${ELASTIC_USERNAME}:${ELASTIC_PASSWORD}"
fi
curl --output /dev/null -k "$@" "http://127.0.0.1:9200${path}"
}
if [ -f "${START_FILE}" ]; then
echo 'Elasticsearch is already running, lets check the node is healthy'
HTTP_CODE=$(http "/" "-w %{http_code}")
RC=$?
if [[ ${RC} -ne 0 ]]; then
echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with RC ${RC}"
exit ${RC}
fi
# ready if HTTP code 200, 503 is tolerable if ES version is 6.x
if [[ ${HTTP_CODE} == "200" ]]; then
exit 0
elif [[ ${HTTP_CODE} == "503" && "7" == "6" ]]; then
exit 0
else
echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with HTTP code ${HTTP_CODE}"
exit 1
fi
else
echo 'Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=2s" )'
if http "/_cluster/health?wait_for_status=green&timeout=2s" "--fail" ; then
touch ${START_FILE}
exit 0
else
echo 'Cluster is not yet ready (request params: "wait_for_status=green&timeout=2s" )'
exit 1
fi
fi
failureThreshold: 4
initialDelaySeconds: 20
periodSeconds: 20
successThreshold: 3
timeoutSeconds: 10
ports:
- name: http
containerPort: 9200
- name: transport
containerPort: 9300
resources:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 1000m
memory: 2Gi
env:
- name: node.name
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: cluster.initial_master_nodes
value: "elasticsearch-master-0,elasticsearch-master-1,elasticsearch-master-2,"
- name: discovery.seed_hosts
value: "elasticsearch-master-headless"
- name: cluster.name
value: "elasticsearch"
- name: network.host
value: "0.0.0.0"
- name: ES_JAVA_OPTS
value: "-Xmx1g -Xms1g"
- name: node.data
value: "true"
- name: node.ingest
value: "true"
- name: node.master
value: "true"
- name: node.remote_cluster_client
value: "true"
- name: ELASTIC_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: elastic-credentials
- name: ELASTIC_USERNAME
valueFrom:
secretKeyRef:
key: username
name: elastic-credentials
volumeMounts:
- name: "elasticsearch-master"
mountPath: /usr/share/elasticsearch/data
- name: elastic-certificates
mountPath: /usr/share/elasticsearch/config/certs
- name: esconfig
mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
subPath: elasticsearch.yml
NOTES:
1. Watch all cluster members come up.
$ kubectl get pods --namespace=testbed-elasticsearch -l app=elasticsearch-master -w
2. Test cluster health using Helm test.
$ helm test testbed-elasticsearch --cleanup
Describe the bug:
After initiating a version upgrade, the cluster remains up, turning yellow first, then after a while, the cluster is green, but 1 pod, the one that started the upgrade, remains unhealthy, other pods are not updated.
kubectl describe
gives:
Normal Scheduled 28m default-scheduler Successfully assigned testbed-elasticsearch/elasticsearch-master-2 to aks-agentpool-41636598-2
Normal Pulling 28m kubelet Pulling image "docker.elastic.co/elasticsearch/elasticsearch:7.16.1"
Normal Pulled 28m kubelet Successfully pulled image "docker.elastic.co/elasticsearch/elasticsearch:7.16.1" in 12.054403933s
Normal Created 28m kubelet Created container configure-sysctl
Normal Started 28m kubelet Started container configure-sysctl
Normal Pulled 28m kubelet Container image "docker.elastic.co/elasticsearch/elasticsearch:7.16.1" already present on machine
Normal Created 28m kubelet Created container elasticsearch
Normal Started 28m kubelet Started container elasticsearch
Warning Unhealthy 27m kubelet Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=2s" )
Cluster is not yet ready (request params: "wait_for_status=green&timeout=2s" )
Warning Unhealthy 3m26s (x72 over 27m) kubelet Readiness probe failed: Elasticsearch is already running, lets check the node is healthy
curl --output /dev/null -k -XGET -s -w '%{http_code}' ${BASIC_AUTH} http://127.0.0.1:9200/ failed with HTTP code 200
sh: 30: [[: not found
sh: 35: [[: not found
sh: 37: [[: not found
Steps to reproduce:
helm upgrade testbed-elasticsearch . --namespace testbed-elasticsearch --create-namespace -f values.yaml
kubectl describe pod elasticsearch-master-2 --namespace=testbed-elasticsearch
Expected behavior: The upgrade succeeds.
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (4 by maintainers)
Top GitHub Comments
I have the same issue with a fresh deploy on version 7.16.1
UPDATE: after running
helm repo update
as ebuildy suggested, it workedYou must also upgrade the Helm chart to 7.16.1, just released yesterday!