question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unhealthy pod after upgrade

See original GitHub issue

Chart version: 7.9.3, with 7.16.1 image

Kubernetes version:

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"archive", BuildDate:"2021-07-22T00:00:00Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.13", GitCommit:"ddb8dba0a4420ec79de95cebfa4bd7d41aed848a", GitTreeState:"clean", BuildDate:"2021-11-01T16:41:29Z", GoVersion:"go1.15.14", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.21) and server (1.19) exceeds the supported minor version skew of +/-1

Kubernetes provider: Azure

Helm Version: version.BuildInfo{Version:"v3.4.1", GitCommit:"c4e74854886b2efe3321e185578e6db9be0a6e29", GitTreeState:"clean", GoVersion:"go1.14.11"}

helm get release output

e.g. helm get elasticsearch (replace elasticsearch with the name of your helm release)

Be careful to obfuscate every secrets (credentials, token, public IP, …) that could be visible in the output before copy-pasting.

If you find some secrets in plain text in helm get release output you should use Kubernetes Secrets to managed them is a secure way (see Security Example).

Output of helm get release
NAME: testbed-elasticsearch
LAST DEPLOYED: Wed Dec 15 13:25:30 2021
NAMESPACE: testbed-elasticsearch
STATUS: deployed
REVISION: 4
USER-SUPPLIED VALUES:
antiAffinity: hard
antiAffinityTopologyKey: kubernetes.io/hostname
clusterHealthCheckParams: wait_for_status=green&timeout=2s
clusterName: elasticsearch
enableServiceLinks: true
envFrom: []
esConfig:
  elasticsearch.yml: |
    xpack.security.enabled: true
    xpack.security.transport.ssl.enabled: true
    xpack.security.transport.ssl.verification_mode: certificate
    xpack.security.transport.ssl.keystore.path: /usr/share/elasticsearch/config/certs/elastic-certificates.p12
    xpack.security.transport.ssl.truststore.path: /usr/share/elasticsearch/config/certs/elastic-certificates.p12
esJavaOpts: -Xmx1g -Xms1g
esMajorVersion: ""
extraContainers: []
extraEnvs:
- name: ELASTIC_PASSWORD
  valueFrom:
    secretKeyRef:
      key: password
      name: elastic-credentials
- name: ELASTIC_USERNAME
  valueFrom:
    secretKeyRef:
      key: username
      name: elastic-credentials
extraInitContainers: []
extraVolumeMounts: []
extraVolumes: []
fsGroup: ""
fullnameOverride: ""
httpPort: 9200
image: docker.elastic.co/elasticsearch/elasticsearch
imagePullPolicy: IfNotPresent
imagePullSecrets: []
imageTag: 7.16.1
ingress:
  annotations:
    acme.cert-manager.io/http01-edit-in-place: "true"
    cert-manager.io/cluster-issuer: letsencrypt-prod
    ingress.kubernetes.io/ssl-redirect: "true"
    kubernetes.io/ingress.class: nginx
  enabled: false
  hosts:
  - testbed-example.com
  path: /
  tls:
  - hosts:
    - testbed-example.com
    secretName: testbed-es-tls-secret
initResources: {}
keystore: []
labels: {}
lifecycle: {}
masterService: ""
masterTerminationFix: false
maxUnavailable: 1
minimumMasterNodes: 2
nameOverride: ""
networkHost: 0.0.0.0
nodeAffinity: {}
nodeGroup: master
nodeSelector: {}
persistence:
  annotations: {}
  enabled: true
  labels:
    enabled: true
podAnnotations: {}
podManagementPolicy: Parallel
podSecurityContext:
  fsGroup: 1000
  runAsUser: 1000
podSecurityPolicy:
  create: false
  name: ""
  spec:
    fsGroup:
      rule: RunAsAny
    privileged: true
    runAsUser:
      rule: RunAsAny
    seLinux:
      rule: RunAsAny
    supplementalGroups:
      rule: RunAsAny
    volumes:
    - secret
    - configMap
    - persistentVolumeClaim
priorityClassName: ""
protocol: http
rbac:
  create: false
  serviceAccountAnnotations: {}
  serviceAccountName: ""
readinessProbe:
  failureThreshold: 4
  initialDelaySeconds: 20
  periodSeconds: 20
  successThreshold: 3
  timeoutSeconds: 10
replicas: 3
resources:
  limits:
    cpu: 1000m
    memory: 2Gi
  requests:
    cpu: 1000m
    memory: 2Gi
roles:
  data: "true"
  ingest: "true"
  master: "true"
  remote_cluster_client: "true"
schedulerName: ""
secretMounts:
- name: elastic-certificates
  path: /usr/share/elasticsearch/config/certs
  secretName: elastic-certificates
securityContext:
  capabilities:
    drop:
    - ALL
  runAsNonRoot: true
  runAsUser: 1000
service:
  annotations: {}
  externalTrafficPolicy: ""
  httpPortName: http
  labels: {}
  labelsHeadless: {}
  loadBalancerIP: ""
  loadBalancerSourceRanges: []
  nodePort: ""
  transportPortName: transport
  type: ClusterIP
sidecarResources: {}
sysctlInitContainer:
  enabled: true
sysctlVmMaxMapCount: 262144
terminationGracePeriod: 120
tolerations: []
transportPort: 9300
updateStrategy: RollingUpdate
volumeClaimTemplate:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 30Gi

COMPUTED VALUES:
antiAffinity: hard
antiAffinityTopologyKey: kubernetes.io/hostname
clusterHealthCheckParams: wait_for_status=green&timeout=2s
clusterName: elasticsearch
enableServiceLinks: true
envFrom: []
esConfig:
  elasticsearch.yml: |
    xpack.security.enabled: true
    xpack.security.transport.ssl.enabled: true
    xpack.security.transport.ssl.verification_mode: certificate
    xpack.security.transport.ssl.keystore.path: /usr/share/elasticsearch/config/certs/elastic-certificates.p12
    xpack.security.transport.ssl.truststore.path: /usr/share/elasticsearch/config/certs/elastic-certificates.p12
esJavaOpts: -Xmx1g -Xms1g
esMajorVersion: ""
extraContainers: []
extraEnvs:
- name: ELASTIC_PASSWORD
  valueFrom:
    secretKeyRef:
      key: password
      name: elastic-credentials
- name: ELASTIC_USERNAME
  valueFrom:
    secretKeyRef:
      key: username
      name: elastic-credentials
extraInitContainers: []
extraVolumeMounts: []
extraVolumes: []
fsGroup: ""
fullnameOverride: ""
httpPort: 9200
image: docker.elastic.co/elasticsearch/elasticsearch
imagePullPolicy: IfNotPresent
imagePullSecrets: []
imageTag: 7.16.1
ingress:
  annotations:
    acme.cert-manager.io/http01-edit-in-place: "true"
    cert-manager.io/cluster-issuer: letsencrypt-prod
    ingress.kubernetes.io/ssl-redirect: "true"
    kubernetes.io/ingress.class: nginx
  enabled: false
  hosts:
  - testbed-example.com
  path: /
  tls:
  - hosts:
    - testbed-example.com
    secretName: testbed-es-tls-secret
initResources: {}
keystore: []
labels: {}
lifecycle: {}
masterService: ""
masterTerminationFix: false
maxUnavailable: 1
minimumMasterNodes: 2
nameOverride: ""
networkHost: 0.0.0.0
nodeAffinity: {}
nodeGroup: master
nodeSelector: {}
persistence:
  annotations: {}
  enabled: true
  labels:
    enabled: true
podAnnotations: {}
podManagementPolicy: Parallel
podSecurityContext:
  fsGroup: 1000
  runAsUser: 1000
podSecurityPolicy:
  create: false
  name: ""
  spec:
    fsGroup:
      rule: RunAsAny
    privileged: true
    runAsUser:
      rule: RunAsAny
    seLinux:
      rule: RunAsAny
    supplementalGroups:
      rule: RunAsAny
    volumes:
    - secret
    - configMap
    - persistentVolumeClaim
priorityClassName: ""
protocol: http
rbac:
  create: false
  serviceAccountAnnotations: {}
  serviceAccountName: ""
readinessProbe:
  failureThreshold: 4
  initialDelaySeconds: 20
  periodSeconds: 20
  successThreshold: 3
  timeoutSeconds: 10
replicas: 3
resources:
  limits:
    cpu: 1000m
    memory: 2Gi
  requests:
    cpu: 1000m
    memory: 2Gi
roles:
  data: "true"
  ingest: "true"
  master: "true"
  remote_cluster_client: "true"
schedulerName: ""
secretMounts:
- name: elastic-certificates
  path: /usr/share/elasticsearch/config/certs
  secretName: elastic-certificates
securityContext:
  capabilities:
    drop:
    - ALL
  runAsNonRoot: true
  runAsUser: 1000
service:
  annotations: {}
  externalTrafficPolicy: ""
  httpPortName: http
  labels: {}
  labelsHeadless: {}
  loadBalancerIP: ""
  loadBalancerSourceRanges: []
  nodePort: ""
  transportPortName: transport
  type: ClusterIP
sidecarResources: {}
sysctlInitContainer:
  enabled: true
sysctlVmMaxMapCount: 262144
terminationGracePeriod: 120
tolerations: []
transportPort: 9300
updateStrategy: RollingUpdate
volumeClaimTemplate:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 30Gi

HOOKS:
---
# Source: elasticsearch/templates/test/test-elasticsearch-health.yaml
apiVersion: v1
kind: Pod
metadata:
  name: "testbed-elasticsearch-cmrxm-test"
  annotations:
    "helm.sh/hook": test-success
spec:
  securityContext:
    fsGroup: 1000
    runAsUser: 1000
  containers:
  - name: "testbed-elasticsearch-fussv-test"
    image: "docker.elastic.co/elasticsearch/elasticsearch:7.16.1"
    imagePullPolicy: "IfNotPresent"
    command:
      - "sh"
      - "-c"
      - |
        #!/usr/bin/env bash -e
        curl -XGET --fail 'elasticsearch-master:9200/_cluster/health?wait_for_status=green&timeout=2s'
  restartPolicy: Never
MANIFEST:
---
# Source: elasticsearch/templates/poddisruptionbudget.yaml
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: "elasticsearch-master-pdb"
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: "elasticsearch-master"
---
# Source: elasticsearch/templates/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: elasticsearch-master-config
  labels:
    heritage: "Helm"
    release: "testbed-elasticsearch"
    chart: "elasticsearch"
    app: "elasticsearch-master"
data:
  elasticsearch.yml: |
    xpack.security.enabled: true
    xpack.security.transport.ssl.enabled: true
    xpack.security.transport.ssl.verification_mode: certificate
    xpack.security.transport.ssl.keystore.path: /usr/share/elasticsearch/config/certs/elastic-certificates.p12
    xpack.security.transport.ssl.truststore.path: /usr/share/elasticsearch/config/certs/elastic-certificates.p12
---
# Source: elasticsearch/templates/service.yaml
kind: Service
apiVersion: v1
metadata:
  name: elasticsearch-master
  labels:
    heritage: "Helm"
    release: "testbed-elasticsearch"
    chart: "elasticsearch"
    app: "elasticsearch-master"
  annotations:
    {}
spec:
  type: ClusterIP
  selector:
    release: "testbed-elasticsearch"
    chart: "elasticsearch"
    app: "elasticsearch-master"
  ports:
  - name: http
    protocol: TCP
    port: 9200
  - name: transport
    protocol: TCP
    port: 9300
---
# Source: elasticsearch/templates/service.yaml
kind: Service
apiVersion: v1
metadata:
  name: elasticsearch-master-headless
  labels:
    heritage: "Helm"
    release: "testbed-elasticsearch"
    chart: "elasticsearch"
    app: "elasticsearch-master"
  annotations:
    service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
spec:
  clusterIP: None # This is needed for statefulset hostnames like elasticsearch-0 to resolve
  # Create endpoints also if the related pod isn't ready
  publishNotReadyAddresses: true
  selector:
    app: "elasticsearch-master"
  ports:
  - name: http
    port: 9200
  - name: transport
    port: 9300
---
# Source: elasticsearch/templates/statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: elasticsearch-master
  labels:
    heritage: "Helm"
    release: "testbed-elasticsearch"
    chart: "elasticsearch"
    app: "elasticsearch-master"
  annotations:
    esMajorVersion: "7"
spec:
  serviceName: elasticsearch-master-headless
  selector:
    matchLabels:
      app: "elasticsearch-master"
  replicas: 3
  podManagementPolicy: Parallel
  updateStrategy:
    type: RollingUpdate
  volumeClaimTemplates:
  - metadata:
      name: elasticsearch-master
      labels:
        heritage: "Helm"
        release: "testbed-elasticsearch"
        chart: "elasticsearch"
        app: "elasticsearch-master"
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 30Gi
  template:
    metadata:
      name: "elasticsearch-master"
      labels:
        heritage: "Helm"
        release: "testbed-elasticsearch"
        chart: "elasticsearch"
        app: "elasticsearch-master"
      annotations:
        
        configchecksum: 22a782517a55e99c9cbbf8968adf234abfbfc63c77e35116bc894d6820d3404
    spec:
      securityContext:
        fsGroup: 1000
        runAsUser: 1000
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - "elasticsearch-master"
            topologyKey: kubernetes.io/hostname
      terminationGracePeriodSeconds: 120
      volumes:
        - name: elastic-certificates
          secret:
            secretName: elastic-certificates
        - name: esconfig
          configMap:
            name: elasticsearch-master-config
      enableServiceLinks: true
      initContainers:
      - name: configure-sysctl
        securityContext:
          runAsUser: 0
          privileged: true
        image: "docker.elastic.co/elasticsearch/elasticsearch:7.16.1"
        imagePullPolicy: "IfNotPresent"
        command: ["sysctl", "-w", "vm.max_map_count=262144"]
        resources:
          {}

      containers:
      - name: "elasticsearch"
        securityContext:
          capabilities:
            drop:
            - ALL
          runAsNonRoot: true
          runAsUser: 1000
        image: "docker.elastic.co/elasticsearch/elasticsearch:7.16.1"
        imagePullPolicy: "IfNotPresent"
        readinessProbe:
          exec:
            command:
              - sh
              - -c
              - |
                #!/usr/bin/env bash -e
                # If the node is starting up wait for the cluster to be ready (request params: "wait_for_status=green&timeout=2s" )
                # Once it has started only check that the node itself is responding
                START_FILE=/tmp/.es_start_file

                # Disable nss cache to avoid filling dentry cache when calling curl
                # This is required with Elasticsearch Docker using nss < 3.52
                export NSS_SDB_USE_CACHE=no

                http () {
                  local path="${1}"
                  local args="${2}"
                  set -- -XGET -s

                  if [ "$args" != "" ]; then
                    set -- "$@" $args
                  fi

                  if [ -n "${ELASTIC_USERNAME}" ] && [ -n "${ELASTIC_PASSWORD}" ]; then
                    set -- "$@" -u "${ELASTIC_USERNAME}:${ELASTIC_PASSWORD}"
                  fi

                  curl --output /dev/null -k "$@" "http://127.0.0.1:9200${path}"
                }

                if [ -f "${START_FILE}" ]; then
                  echo 'Elasticsearch is already running, lets check the node is healthy'
                  HTTP_CODE=$(http "/" "-w %{http_code}")
                  RC=$?
                  if [[ ${RC} -ne 0 ]]; then
                    echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with RC ${RC}"
                    exit ${RC}
                  fi
                  # ready if HTTP code 200, 503 is tolerable if ES version is 6.x
                  if [[ ${HTTP_CODE} == "200" ]]; then
                    exit 0
                  elif [[ ${HTTP_CODE} == "503" && "7" == "6" ]]; then
                    exit 0
                  else
                    echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with HTTP code ${HTTP_CODE}"
                    exit 1
                  fi

                else
                  echo 'Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=2s" )'
                  if http "/_cluster/health?wait_for_status=green&timeout=2s" "--fail" ; then
                    touch ${START_FILE}
                    exit 0
                  else
                    echo 'Cluster is not yet ready (request params: "wait_for_status=green&timeout=2s" )'
                    exit 1
                  fi
                fi
          failureThreshold: 4
          initialDelaySeconds: 20
          periodSeconds: 20
          successThreshold: 3
          timeoutSeconds: 10
        ports:
        - name: http
          containerPort: 9200
        - name: transport
          containerPort: 9300
        resources:
          limits:
            cpu: 1000m
            memory: 2Gi
          requests:
            cpu: 1000m
            memory: 2Gi
        env:
          - name: node.name
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: cluster.initial_master_nodes
            value: "elasticsearch-master-0,elasticsearch-master-1,elasticsearch-master-2,"
          - name: discovery.seed_hosts
            value: "elasticsearch-master-headless"
          - name: cluster.name
            value: "elasticsearch"
          - name: network.host
            value: "0.0.0.0"
          - name: ES_JAVA_OPTS
            value: "-Xmx1g -Xms1g"
          - name: node.data
            value: "true"
          - name: node.ingest
            value: "true"
          - name: node.master
            value: "true"
          - name: node.remote_cluster_client
            value: "true"
          - name: ELASTIC_PASSWORD
            valueFrom:
              secretKeyRef:
                key: password
                name: elastic-credentials
          - name: ELASTIC_USERNAME
            valueFrom:
              secretKeyRef:
                key: username
                name: elastic-credentials
        volumeMounts:
          - name: "elasticsearch-master"
            mountPath: /usr/share/elasticsearch/data

          - name: elastic-certificates
            mountPath: /usr/share/elasticsearch/config/certs
          - name: esconfig
            mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
            subPath: elasticsearch.yml

NOTES:
1. Watch all cluster members come up.
  $ kubectl get pods --namespace=testbed-elasticsearch -l app=elasticsearch-master -w
2. Test cluster health using Helm test.
  $ helm test testbed-elasticsearch --cleanup

Describe the bug:

After initiating a version upgrade, the cluster remains up, turning yellow first, then after a while, the cluster is green, but 1 pod, the one that started the upgrade, remains unhealthy, other pods are not updated.

kubectl describe gives:

  Normal   Scheduled  28m   default-scheduler  Successfully assigned testbed-elasticsearch/elasticsearch-master-2 to aks-agentpool-41636598-2
  Normal   Pulling    28m   kubelet            Pulling image "docker.elastic.co/elasticsearch/elasticsearch:7.16.1"
  Normal   Pulled     28m   kubelet            Successfully pulled image "docker.elastic.co/elasticsearch/elasticsearch:7.16.1" in 12.054403933s
  Normal   Created    28m   kubelet            Created container configure-sysctl
  Normal   Started    28m   kubelet            Started container configure-sysctl
  Normal   Pulled     28m   kubelet            Container image "docker.elastic.co/elasticsearch/elasticsearch:7.16.1" already present on machine
  Normal   Created    28m   kubelet            Created container elasticsearch
  Normal   Started    28m   kubelet            Started container elasticsearch
  Warning  Unhealthy  27m   kubelet            Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=2s" )
Cluster is not yet ready (request params: "wait_for_status=green&timeout=2s" )
  Warning  Unhealthy  3m26s (x72 over 27m)  kubelet  Readiness probe failed: Elasticsearch is already running, lets check the node is healthy
curl --output /dev/null -k -XGET -s -w '%{http_code}' ${BASIC_AUTH} http://127.0.0.1:9200/ failed with HTTP code 200
sh: 30: [[: not found
sh: 35: [[: not found
sh: 37: [[: not found

Steps to reproduce:

  1. helm upgrade testbed-elasticsearch . --namespace testbed-elasticsearch --create-namespace -f values.yaml
  2. kubectl describe pod elasticsearch-master-2 --namespace=testbed-elasticsearch

Expected behavior: The upgrade succeeds.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
hhaslam11commented, Dec 15, 2021

I have the same issue with a fresh deploy on version 7.16.1

UPDATE: after running helm repo update as ebuildy suggested, it worked

1reaction
ebuildycommented, Dec 15, 2021

You must also upgrade the Helm chart to 7.16.1, just released yesterday!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pod Lifecycle | Kubernetes
Pod network readiness​​ After a Pod gets scheduled on a node, it needs to be admitted by the Kubelet and have any volumes...
Read more >
How do I initiate a pod replacement when it's unhealthy?
my question is, how do i get it to just off itself? the only time the pods get replaced is if i update...
Read more >
Replacing an unhealthy etcd member | Backup and restore
This process depends on whether the etcd member is unhealthy because the machine is not running or the node is not ready, or...
Read more >
Kubernetes Health Alerts - Atomist Blog
When an unhealthy pod is detected, an alert message is sent to a channel in Slack or Microsoft Teams. For example, here is...
Read more >
Unhealthy pods are not removed from consul on k8s
Also as a side notes, this issue happens when pods are updated via helm upgrade , consul adds the new pods/services but does...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found