question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[elasticsearch] Readiness probe is failing again with 8.0.0-SNAPSHOT and default config

See original GitHub issue

Chart version: 8.0.0-SNAPSHOT

Kubernetes version: all

Kubernetes provider: all

Helm Version: all

helm get release output:

Output of helm get release
NAME: es
LAST DEPLOYED: Fri Nov  5 17:27:14 2021
NAMESPACE: default
STATUS: deployed
REVISION: 1
USER-SUPPLIED VALUES:
null

COMPUTED VALUES:
antiAffinity: hard
antiAffinityTopologyKey: kubernetes.io/hostname
clusterHealthCheckParams: wait_for_status=green&timeout=1s
clusterName: elasticsearch
enableServiceLinks: true
envFrom: []
esConfig: {}
esJavaOpts: ""
esMajorVersion: ""
extraContainers: []
extraEnvs: []
extraInitContainers: []
extraVolumeMounts: []
extraVolumes: []
fsGroup: ""
fullnameOverride: ""
healthNameOverride: ""
hostAliases: []
httpPort: 9200
image: docker.elastic.co/elasticsearch/elasticsearch
imagePullPolicy: IfNotPresent
imagePullSecrets: []
imageTag: 8.0.0-SNAPSHOT
ingress:
  annotations: {}
  className: nginx
  enabled: false
  hosts:
  - host: chart-example.local
    paths:
    - path: /
  pathtype: ImplementationSpecific
  tls: []
initResources: {}
keystore: []
labels: {}
lifecycle: {}
masterService: ""
maxUnavailable: 1
minimumMasterNodes: 2
nameOverride: ""
networkHost: 0.0.0.0
networkPolicy:
  http:
    enabled: false
  transport:
    enabled: false
nodeAffinity: {}
nodeGroup: master
nodeSelector: {}
persistence:
  annotations: {}
  enabled: true
  labels:
    enabled: false
podAnnotations: {}
podManagementPolicy: Parallel
podSecurityContext:
  fsGroup: 1000
  runAsUser: 1000
podSecurityPolicy:
  create: false
  name: ""
  spec:
    fsGroup:
      rule: RunAsAny
    privileged: true
    runAsUser:
      rule: RunAsAny
    seLinux:
      rule: RunAsAny
    supplementalGroups:
      rule: RunAsAny
    volumes:
    - secret
    - configMap
    - persistentVolumeClaim
    - emptyDir
priorityClassName: ""
protocol: http
rbac:
  automountToken: true
  create: false
  serviceAccountAnnotations: {}
  serviceAccountName: ""
readinessProbe:
  failureThreshold: 3
  initialDelaySeconds: 10
  periodSeconds: 10
  successThreshold: 3
  timeoutSeconds: 5
replicas: 3
resources:
  limits:
    cpu: 1000m
    memory: 2Gi
  requests:
    cpu: 1000m
    memory: 2Gi
roles:
- master
- data
- data_content
- data_hot
- data_warm
- data_cold
- ingest
- ml
- remote_cluster_client
- transform
schedulerName: ""
secret:
  enabled: true
  password: ""
secretMounts: []
securityContext:
  capabilities:
    drop:
    - ALL
  runAsNonRoot: true
  runAsUser: 1000
service:
  annotations: {}
  enabled: true
  externalTrafficPolicy: ""
  httpPortName: http
  labels: {}
  labelsHeadless: {}
  loadBalancerIP: ""
  loadBalancerSourceRanges: []
  nodePort: ""
  transportPortName: transport
  type: ClusterIP
sysctlInitContainer:
  enabled: true
sysctlVmMaxMapCount: 262144
terminationGracePeriod: 120
tests:
  enabled: true
tolerations: []
transportPort: 9300
updateStrategy: RollingUpdate
volumeClaimTemplate:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 30Gi

HOOKS:
---
# Source: elasticsearch/templates/test/test-elasticsearch-health.yaml
apiVersion: v1
kind: Pod
metadata:
  name: "es-qzanl-test"
  annotations:
    "helm.sh/hook": test
    "helm.sh/hook-delete-policy": hook-succeeded
spec:
  securityContext:
    fsGroup: 1000
    runAsUser: 1000
  containers:
  - name: "es-novrx-test"
    image: "docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT"
    imagePullPolicy: "IfNotPresent"
    command:
      - "sh"
      - "-c"
      - |
        #!/usr/bin/env bash -e
        curl -XGET --fail 'elasticsearch-master:9200/_cluster/health?wait_for_status=green&timeout=1s'
  restartPolicy: Never
MANIFEST:
---
# Source: elasticsearch/templates/poddisruptionbudget.yaml
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: "elasticsearch-master-pdb"
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: "elasticsearch-master"
---
# Source: elasticsearch/templates/secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: elasticsearch-master-credentials
  labels:
    heritage: "Helm"
    release: "es"
    chart: "elasticsearch"
    app: "elasticsearch-master"
type: Opaque
data:
  username: ZWxhc3RpYw==
  password: "dDFIa1VCTkxIRTE0VkdyRQ=="
---
# Source: elasticsearch/templates/service.yaml
kind: Service
apiVersion: v1
metadata:
  name: elasticsearch-master
  labels:
    heritage: "Helm"
    release: "es"
    chart: "elasticsearch"
    app: "elasticsearch-master"
  annotations:
    {}
spec:
  type: ClusterIP
  selector:
    release: "es"
    chart: "elasticsearch"
    app: "elasticsearch-master"
  ports:
  - name: http
    protocol: TCP
    port: 9200
  - name: transport
    protocol: TCP
    port: 9300
---
# Source: elasticsearch/templates/service.yaml
kind: Service
apiVersion: v1
metadata:
  name: elasticsearch-master-headless
  labels:
    heritage: "Helm"
    release: "es"
    chart: "elasticsearch"
    app: "elasticsearch-master"
  annotations:
    service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
spec:
  clusterIP: None # This is needed for statefulset hostnames like elasticsearch-0 to resolve
  # Create endpoints also if the related pod isn't ready
  publishNotReadyAddresses: true
  selector:
    app: "elasticsearch-master"
  ports:
  - name: http
    port: 9200
  - name: transport
    port: 9300
---
# Source: elasticsearch/templates/statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: elasticsearch-master
  labels:
    heritage: "Helm"
    release: "es"
    chart: "elasticsearch"
    app: "elasticsearch-master"
  annotations:
    esMajorVersion: "8"
spec:
  serviceName: elasticsearch-master-headless
  selector:
    matchLabels:
      app: "elasticsearch-master"
  replicas: 3
  podManagementPolicy: Parallel
  updateStrategy:
    type: RollingUpdate
  volumeClaimTemplates:
  - metadata:
      name: elasticsearch-master
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 30Gi
  template:
    metadata:
      name: "elasticsearch-master"
      labels:
        release: "es"
        chart: "elasticsearch"
        app: "elasticsearch-master"
      annotations:
        
    spec:
      securityContext:
        fsGroup: 1000
        runAsUser: 1000
      automountServiceAccountToken: true
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - "elasticsearch-master"
            topologyKey: kubernetes.io/hostname
      terminationGracePeriodSeconds: 120
      volumes:
      enableServiceLinks: true
      initContainers:
      - name: configure-sysctl
        securityContext:
          runAsUser: 0
          privileged: true
        image: "docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT"
        imagePullPolicy: "IfNotPresent"
        command: ["sysctl", "-w", "vm.max_map_count=262144"]
        resources:
          {}

      containers:
      - name: "elasticsearch"
        securityContext:
          capabilities:
            drop:
            - ALL
          runAsNonRoot: true
          runAsUser: 1000
        image: "docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT"
        imagePullPolicy: "IfNotPresent"
        readinessProbe:
          exec:
            command:
              - sh
              - -c
              - |
                #!/usr/bin/env bash -e

                # Exit if ELASTIC_PASSWORD in unset
                if [ -z "${ELASTIC_PASSWORD}" ]; then
                  echo "ELASTIC_PASSWORD variable is missing, exiting"
                  exit 1
                fi

                # If the node is starting up wait for the cluster to be ready (request params: "wait_for_status=green&timeout=1s" )
                # Once it has started only check that the node itself is responding
                START_FILE=/tmp/.es_start_file

                # Disable nss cache to avoid filling dentry cache when calling curl
                # This is required with Elasticsearch Docker using nss < 3.52
                export NSS_SDB_USE_CACHE=no

                http () {
                  local path="${1}"
                  local args="${2}"
                  set -- -XGET -s

                  if [ "$args" != "" ]; then
                    set -- "$@" $args
                  fi

                  set -- "$@" -u "elastic:${ELASTIC_PASSWORD}"

                  curl --output /dev/null -k "$@" "http://127.0.0.1:9200${path}"
                }

                if [ -f "${START_FILE}" ]; then
                  echo 'Elasticsearch is already running, lets check the node is healthy'
                  HTTP_CODE=$(http "/" "-w %{http_code}")
                  RC=$?
                  if [[ ${RC} -ne 0 ]]; then
                    echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with RC ${RC}"
                    exit ${RC}
                  fi
                  # ready if HTTP code 200, 503 is tolerable if ES version is 6.x
                  if [[ ${HTTP_CODE} == "200" ]]; then
                    exit 0
                  elif [[ ${HTTP_CODE} == "503" && "8" == "6" ]]; then
                    exit 0
                  else
                    echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with HTTP code ${HTTP_CODE}"
                    exit 1
                  fi

                else
                  echo 'Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )'
                  if http "/_cluster/health?wait_for_status=green&timeout=1s" "--fail" ; then
                    touch ${START_FILE}
                    exit 0
                  else
                    echo 'Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )'
                    exit 1
                  fi
                fi
          failureThreshold: 3
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 3
          timeoutSeconds: 5
        ports:
        - name: http
          containerPort: 9200
        - name: transport
          containerPort: 9300
        resources:
          limits:
            cpu: 1000m
            memory: 2Gi
          requests:
            cpu: 1000m
            memory: 2Gi
        env:
          - name: node.name
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: cluster.initial_master_nodes
            value: "elasticsearch-master-0,elasticsearch-master-1,elasticsearch-master-2,"
          - name: node.roles
            value: "master,data,data_content,data_hot,data_warm,data_cold,ingest,ml,remote_cluster_client,transform,"
          - name: discovery.seed_hosts
            value: "elasticsearch-master-headless"
          - name: cluster.name
            value: "elasticsearch"
          - name: network.host
            value: "0.0.0.0"
          - name: ELASTIC_PASSWORD
            valueFrom:
              secretKeyRef:
                name: elasticsearch-master-credentials
                key: password
        volumeMounts:
          - name: "elasticsearch-master"
            mountPath: /usr/share/elasticsearch/data

NOTES:
1. Watch all cluster members come up.
  $ kubectl get pods --namespace=default -l app=elasticsearch-master -w
2. Retrieve elastic user's password.
  $ kubectl get secrets --namespace=default elasticsearch-master-credentials -ojsonpath='{.data.password}' | base64 -d
3. Test cluster health using Helm test.
  $ helm --namespace=default test es

Describe the bug:

When using 8.0.0-SNAPSHOT and default values (default elasticsearch config, security not enforced), Elasticsearch chart fails to deploy, with pods never reaching ready state due to Readiness probe failing:

$ kubectl describe pod elasticsearch-master-0
...
  Normal   Started                 116s (x3 over 3m37s)  kubelet                  Started container elasticsearch
  Warning  Unhealthy               76s (x11 over 3m26s)  kubelet                  Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )
Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )
  Warning  BackOff  72s (x2 over 2m8s)  kubelet  Back-off restarting failed container

This seems to be related to the new behavior where Elasticsearch generates TLS certificates by default: https://github.com/elastic/elasticsearch/pull/77231 cc @elastic/es-delivery @jkakavas @mgreau @nkammah

Steps to reproduce:

  1. Try deploy Elasticsearch chart with default values from master branch
$ cd helm-charts
$ git checkout master
$ helm install es ./elasticsearch
  1. That’s all

Expected behavior: Readiness probe should success.

Provide logs and/or server output (if relevant):

Elasticsearch logs
ERROR: [1] bootstrap checks failed. You must address the points described in the following [1] lines before starting Elasticsearch.
bootstrap check failure [1] of [1]: Transport SSL must be enabled if security is enabled. Please set [xpack.security.transport.ssl.enabled] to [true] or disable security by setting [xpack.security.enabled] to [false]
ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/elasticsearch.log

Any additional context:

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:2
  • Comments:9 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
q-leobrackcommented, Dec 13, 2021

Hi @gohmc @mloeschner, this is a different issue. We are working toward fixing it and should publish a new 7.16.1 release soon.

@jmlrt is there an issue link for this one? Thanks

EDIT: Couldn’t spot one. Opened https://github.com/elastic/helm-charts/issues/1473

1reaction
mloeschnercommented, Dec 13, 2021

It’s simply the default shell of the container which is not capable to use double square-brackets. Change it to bash for example and it’ll work.

readinessProbe:
          exec:
            command:
            - bash
Read more comments on GitHub >

github_iconTop Results From Across the Web

Elasticsearch readiness probe failures - Elastic Discuss
We would like to understand why this happens? Our usecase is purely log ingestion with the ELKB stack, the setup consists of a...
Read more >
Elasticsearch pod readiness probe fails with "message"
The solution to my problem was so easy, that I did not behave. I narrowed down the problem to the TLS handshakes failing....
Read more >
Configuration | Camunda Platform 8 Docs
By default, the configuration for Tasklist is stored in a YAML file application.yml . ... Readiness probe as yaml config; Liveness probe as...
Read more >
Configure Health Probes - Open Service Mesh Documentation
For HTTP probes, the following table shows the modified path and port for each probe type. Probe, Path, Port. Liveness, /osm-liveness-probe, 15901. Readiness,...
Read more >
[elasticsearch] Readiness probe is failing with 8.0.0 ... - bytemeta
When using 8.0.0-SNAPSHOT and default values (default elasticsearch config, security not enforced), Elasticsearch chart fails to deploy, with pods never ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found