[elasticsearch] Readiness probe is failing again with 8.0.0-SNAPSHOT and default config
See original GitHub issueChart version: 8.0.0-SNAPSHOT
Kubernetes version: all
Kubernetes provider: all
Helm Version: all
helm get release
output:
Output of helm get release
NAME: es
LAST DEPLOYED: Fri Nov 5 17:27:14 2021
NAMESPACE: default
STATUS: deployed
REVISION: 1
USER-SUPPLIED VALUES:
null
COMPUTED VALUES:
antiAffinity: hard
antiAffinityTopologyKey: kubernetes.io/hostname
clusterHealthCheckParams: wait_for_status=green&timeout=1s
clusterName: elasticsearch
enableServiceLinks: true
envFrom: []
esConfig: {}
esJavaOpts: ""
esMajorVersion: ""
extraContainers: []
extraEnvs: []
extraInitContainers: []
extraVolumeMounts: []
extraVolumes: []
fsGroup: ""
fullnameOverride: ""
healthNameOverride: ""
hostAliases: []
httpPort: 9200
image: docker.elastic.co/elasticsearch/elasticsearch
imagePullPolicy: IfNotPresent
imagePullSecrets: []
imageTag: 8.0.0-SNAPSHOT
ingress:
annotations: {}
className: nginx
enabled: false
hosts:
- host: chart-example.local
paths:
- path: /
pathtype: ImplementationSpecific
tls: []
initResources: {}
keystore: []
labels: {}
lifecycle: {}
masterService: ""
maxUnavailable: 1
minimumMasterNodes: 2
nameOverride: ""
networkHost: 0.0.0.0
networkPolicy:
http:
enabled: false
transport:
enabled: false
nodeAffinity: {}
nodeGroup: master
nodeSelector: {}
persistence:
annotations: {}
enabled: true
labels:
enabled: false
podAnnotations: {}
podManagementPolicy: Parallel
podSecurityContext:
fsGroup: 1000
runAsUser: 1000
podSecurityPolicy:
create: false
name: ""
spec:
fsGroup:
rule: RunAsAny
privileged: true
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- secret
- configMap
- persistentVolumeClaim
- emptyDir
priorityClassName: ""
protocol: http
rbac:
automountToken: true
create: false
serviceAccountAnnotations: {}
serviceAccountName: ""
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 3
timeoutSeconds: 5
replicas: 3
resources:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 1000m
memory: 2Gi
roles:
- master
- data
- data_content
- data_hot
- data_warm
- data_cold
- ingest
- ml
- remote_cluster_client
- transform
schedulerName: ""
secret:
enabled: true
password: ""
secretMounts: []
securityContext:
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1000
service:
annotations: {}
enabled: true
externalTrafficPolicy: ""
httpPortName: http
labels: {}
labelsHeadless: {}
loadBalancerIP: ""
loadBalancerSourceRanges: []
nodePort: ""
transportPortName: transport
type: ClusterIP
sysctlInitContainer:
enabled: true
sysctlVmMaxMapCount: 262144
terminationGracePeriod: 120
tests:
enabled: true
tolerations: []
transportPort: 9300
updateStrategy: RollingUpdate
volumeClaimTemplate:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 30Gi
HOOKS:
---
# Source: elasticsearch/templates/test/test-elasticsearch-health.yaml
apiVersion: v1
kind: Pod
metadata:
name: "es-qzanl-test"
annotations:
"helm.sh/hook": test
"helm.sh/hook-delete-policy": hook-succeeded
spec:
securityContext:
fsGroup: 1000
runAsUser: 1000
containers:
- name: "es-novrx-test"
image: "docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT"
imagePullPolicy: "IfNotPresent"
command:
- "sh"
- "-c"
- |
#!/usr/bin/env bash -e
curl -XGET --fail 'elasticsearch-master:9200/_cluster/health?wait_for_status=green&timeout=1s'
restartPolicy: Never
MANIFEST:
---
# Source: elasticsearch/templates/poddisruptionbudget.yaml
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: "elasticsearch-master-pdb"
spec:
maxUnavailable: 1
selector:
matchLabels:
app: "elasticsearch-master"
---
# Source: elasticsearch/templates/secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: elasticsearch-master-credentials
labels:
heritage: "Helm"
release: "es"
chart: "elasticsearch"
app: "elasticsearch-master"
type: Opaque
data:
username: ZWxhc3RpYw==
password: "dDFIa1VCTkxIRTE0VkdyRQ=="
---
# Source: elasticsearch/templates/service.yaml
kind: Service
apiVersion: v1
metadata:
name: elasticsearch-master
labels:
heritage: "Helm"
release: "es"
chart: "elasticsearch"
app: "elasticsearch-master"
annotations:
{}
spec:
type: ClusterIP
selector:
release: "es"
chart: "elasticsearch"
app: "elasticsearch-master"
ports:
- name: http
protocol: TCP
port: 9200
- name: transport
protocol: TCP
port: 9300
---
# Source: elasticsearch/templates/service.yaml
kind: Service
apiVersion: v1
metadata:
name: elasticsearch-master-headless
labels:
heritage: "Helm"
release: "es"
chart: "elasticsearch"
app: "elasticsearch-master"
annotations:
service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
spec:
clusterIP: None # This is needed for statefulset hostnames like elasticsearch-0 to resolve
# Create endpoints also if the related pod isn't ready
publishNotReadyAddresses: true
selector:
app: "elasticsearch-master"
ports:
- name: http
port: 9200
- name: transport
port: 9300
---
# Source: elasticsearch/templates/statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch-master
labels:
heritage: "Helm"
release: "es"
chart: "elasticsearch"
app: "elasticsearch-master"
annotations:
esMajorVersion: "8"
spec:
serviceName: elasticsearch-master-headless
selector:
matchLabels:
app: "elasticsearch-master"
replicas: 3
podManagementPolicy: Parallel
updateStrategy:
type: RollingUpdate
volumeClaimTemplates:
- metadata:
name: elasticsearch-master
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 30Gi
template:
metadata:
name: "elasticsearch-master"
labels:
release: "es"
chart: "elasticsearch"
app: "elasticsearch-master"
annotations:
spec:
securityContext:
fsGroup: 1000
runAsUser: 1000
automountServiceAccountToken: true
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- "elasticsearch-master"
topologyKey: kubernetes.io/hostname
terminationGracePeriodSeconds: 120
volumes:
enableServiceLinks: true
initContainers:
- name: configure-sysctl
securityContext:
runAsUser: 0
privileged: true
image: "docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT"
imagePullPolicy: "IfNotPresent"
command: ["sysctl", "-w", "vm.max_map_count=262144"]
resources:
{}
containers:
- name: "elasticsearch"
securityContext:
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1000
image: "docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT"
imagePullPolicy: "IfNotPresent"
readinessProbe:
exec:
command:
- sh
- -c
- |
#!/usr/bin/env bash -e
# Exit if ELASTIC_PASSWORD in unset
if [ -z "${ELASTIC_PASSWORD}" ]; then
echo "ELASTIC_PASSWORD variable is missing, exiting"
exit 1
fi
# If the node is starting up wait for the cluster to be ready (request params: "wait_for_status=green&timeout=1s" )
# Once it has started only check that the node itself is responding
START_FILE=/tmp/.es_start_file
# Disable nss cache to avoid filling dentry cache when calling curl
# This is required with Elasticsearch Docker using nss < 3.52
export NSS_SDB_USE_CACHE=no
http () {
local path="${1}"
local args="${2}"
set -- -XGET -s
if [ "$args" != "" ]; then
set -- "$@" $args
fi
set -- "$@" -u "elastic:${ELASTIC_PASSWORD}"
curl --output /dev/null -k "$@" "http://127.0.0.1:9200${path}"
}
if [ -f "${START_FILE}" ]; then
echo 'Elasticsearch is already running, lets check the node is healthy'
HTTP_CODE=$(http "/" "-w %{http_code}")
RC=$?
if [[ ${RC} -ne 0 ]]; then
echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with RC ${RC}"
exit ${RC}
fi
# ready if HTTP code 200, 503 is tolerable if ES version is 6.x
if [[ ${HTTP_CODE} == "200" ]]; then
exit 0
elif [[ ${HTTP_CODE} == "503" && "8" == "6" ]]; then
exit 0
else
echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with HTTP code ${HTTP_CODE}"
exit 1
fi
else
echo 'Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )'
if http "/_cluster/health?wait_for_status=green&timeout=1s" "--fail" ; then
touch ${START_FILE}
exit 0
else
echo 'Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )'
exit 1
fi
fi
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 3
timeoutSeconds: 5
ports:
- name: http
containerPort: 9200
- name: transport
containerPort: 9300
resources:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 1000m
memory: 2Gi
env:
- name: node.name
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: cluster.initial_master_nodes
value: "elasticsearch-master-0,elasticsearch-master-1,elasticsearch-master-2,"
- name: node.roles
value: "master,data,data_content,data_hot,data_warm,data_cold,ingest,ml,remote_cluster_client,transform,"
- name: discovery.seed_hosts
value: "elasticsearch-master-headless"
- name: cluster.name
value: "elasticsearch"
- name: network.host
value: "0.0.0.0"
- name: ELASTIC_PASSWORD
valueFrom:
secretKeyRef:
name: elasticsearch-master-credentials
key: password
volumeMounts:
- name: "elasticsearch-master"
mountPath: /usr/share/elasticsearch/data
NOTES:
1. Watch all cluster members come up.
$ kubectl get pods --namespace=default -l app=elasticsearch-master -w
2. Retrieve elastic user's password.
$ kubectl get secrets --namespace=default elasticsearch-master-credentials -ojsonpath='{.data.password}' | base64 -d
3. Test cluster health using Helm test.
$ helm --namespace=default test es
Describe the bug:
When using 8.0.0-SNAPSHOT
and default values (default elasticsearch config, security not enforced), Elasticsearch chart fails to deploy, with pods never reaching ready state due to Readiness probe failing:
$ kubectl describe pod elasticsearch-master-0
...
Normal Started 116s (x3 over 3m37s) kubelet Started container elasticsearch
Warning Unhealthy 76s (x11 over 3m26s) kubelet Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )
Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )
Warning BackOff 72s (x2 over 2m8s) kubelet Back-off restarting failed container
This seems to be related to the new behavior where Elasticsearch generates TLS certificates by default: https://github.com/elastic/elasticsearch/pull/77231 cc @elastic/es-delivery @jkakavas @mgreau @nkammah
Steps to reproduce:
- Try deploy Elasticsearch chart with default values from master branch
$ cd helm-charts
$ git checkout master
$ helm install es ./elasticsearch
- That’s all
Expected behavior: Readiness probe should success.
Provide logs and/or server output (if relevant):
Elasticsearch logs
ERROR: [1] bootstrap checks failed. You must address the points described in the following [1] lines before starting Elasticsearch.
bootstrap check failure [1] of [1]: Transport SSL must be enabled if security is enabled. Please set [xpack.security.transport.ssl.enabled] to [true] or disable security by setting [xpack.security.enabled] to [false]
ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/elasticsearch.log
Any additional context:
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:9 (2 by maintainers)
Top Results From Across the Web
Elasticsearch readiness probe failures - Elastic Discuss
We would like to understand why this happens? Our usecase is purely log ingestion with the ELKB stack, the setup consists of a...
Read more >Elasticsearch pod readiness probe fails with "message"
The solution to my problem was so easy, that I did not behave. I narrowed down the problem to the TLS handshakes failing....
Read more >Configuration | Camunda Platform 8 Docs
By default, the configuration for Tasklist is stored in a YAML file application.yml . ... Readiness probe as yaml config; Liveness probe as...
Read more >Configure Health Probes - Open Service Mesh Documentation
For HTTP probes, the following table shows the modified path and port for each probe type. Probe, Path, Port. Liveness, /osm-liveness-probe, 15901. Readiness,...
Read more >[elasticsearch] Readiness probe is failing with 8.0.0 ... - bytemeta
When using 8.0.0-SNAPSHOT and default values (default elasticsearch config, security not enforced), Elasticsearch chart fails to deploy, with pods never ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@jmlrt is there an issue link for this one? Thanks
EDIT: Couldn’t spot one. Opened https://github.com/elastic/helm-charts/issues/1473
It’s simply the default shell of the container which is not capable to use double square-brackets. Change it to
bash
for example and it’ll work.