question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Elasticsearch pods fail to startup

See original GitHub issue

Chart Version 7.3.0 Kubernetes Version 1.13 Kubernetes provider AWS (EKS cluster) Helm Version 2.11.0 Everytime I run helm install --name elasticsearch elastic/elasticsearch --namespace elasticsearch --set resources.requests.memory=1.5Gi --set resources.limits.memory=1.5Gi --tiller-namespace spinnaker it appears to be deploying the master pods but they don’t start.

NAME:   elasticsearch
LAST DEPLOYED: Wed Sep 11 17:49:30 2019
NAMESPACE: elasticsearch
STATUS: DEPLOYED

RESOURCES:
==> v1/Pod(related)
NAME                    READY  STATUS       RESTARTS  AGE
elasticsearch-master-0  0/1    Init:0/1     0         1s
elasticsearch-master-1  0/1    Terminating  0         5m46s
elasticsearch-master-2  0/1    Terminating  4         5m46s

==> v1/Service
NAME                           TYPE       CLUSTER-IP    EXTERNAL-IP  PORT(S)            AGE
elasticsearch-master           ClusterIP  172.20.72.18  <none>       9200/TCP,9300/TCP  1s
elasticsearch-master-headless  ClusterIP  None          <none>       9200/TCP,9300/TCP  1s

==> v1beta1/PodDisruptionBudget
NAME                      MIN AVAILABLE  MAX UNAVAILABLE  ALLOWED DISRUPTIONS  AGE
elasticsearch-master-pdb  N/A            1                0                    1s

==> v1beta1/StatefulSet
NAME                  READY  AGE
elasticsearch-master  0/3    1s


NOTES:
1. Watch all cluster members come up.
  $ kubectl get pods --namespace=elasticsearch -l app=elasticsearch-master -w
2. Test cluster health using Helm test.
  $ helm test elasticsearch

Here is the output of running helm get elasticsearch --tiller-namespace spinnaker

REVISION: 1
RELEASED: Wed Sep 11 17:49:30 2019
CHART: elasticsearch-7.3.0
USER-SUPPLIED VALUES:
resources:
  limits:
    memory: 1.5Gi
  requests:
    memory: 1.5Gi

COMPUTED VALUES:
antiAffinity: hard
antiAffinityTopologyKey: kubernetes.io/hostname
clusterHealthCheckParams: wait_for_status=green&timeout=1s
clusterName: elasticsearch
esConfig: {}
esJavaOpts: -Xmx1g -Xms1g
esMajorVersion: ""
extraEnvs: []
extraInitContainers: []
extraVolumeMounts: []
extraVolumes: []
fsGroup: ""
fullnameOverride: ""
httpPort: 9200
image: docker.elastic.co/elasticsearch/elasticsearch
imagePullPolicy: IfNotPresent
imagePullSecrets: []
imageTag: 7.3.0
ingress:
  annotations: {}
  enabled: false
  hosts:
  - chart-example.local
  path: /
  tls: []
initResources: {}
labels: {}
lifecycle: {}
masterService: ""
masterTerminationFix: false
maxUnavailable: 1
minimumMasterNodes: 2
nameOverride: ""
networkHost: 0.0.0.0
nodeAffinity: {}
nodeGroup: master
nodeSelector: {}
persistence:
  annotations: {}
  enabled: true
podAnnotations: {}
podManagementPolicy: Parallel
podSecurityContext:
  fsGroup: 1000
priorityClassName: ""
protocol: http
readinessProbe:
  failureThreshold: 3
  initialDelaySeconds: 10
  periodSeconds: 10
  successThreshold: 3
  timeoutSeconds: 5
replicas: 3
resources:
  limits:
    cpu: 1000m
    memory: 1.5Gi
  requests:
    cpu: 100m
    memory: 1.5Gi
roles:
  data: "true"
  ingest: "true"
  master: "true"
schedulerName: ""
secretMounts: []
securityContext:
  capabilities:
    drop:
    - ALL
  runAsNonRoot: true
  runAsUser: 1000
service:
  annotations: {}
  nodePort: null
  type: ClusterIP
sidecarResources: {}
sysctlInitContainer:
  enabled: true
sysctlVmMaxMapCount: 262144
terminationGracePeriod: 120
tolerations: []
transportPort: 9300
updateStrategy: RollingUpdate
volumeClaimTemplate:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 30Gi

HOOKS:
---
# elasticsearch-sjzcy-test
apiVersion: v1
kind: Pod
metadata:
  name: "elasticsearch-sjzcy-test"
  annotations:
    "helm.sh/hook": test-success
spec:
  containers:
  - name: "elasticsearch-sphns-test"
    image: "docker.elastic.co/elasticsearch/elasticsearch:7.3.0"
    command:
      - "sh"
      - "-c"
      - |
        #!/usr/bin/env bash -e
        curl -XGET --fail 'elasticsearch-master:9200/_cluster/health?wait_for_status=green&timeout=1s'
  restartPolicy: Never
MANIFEST:

---
# Source: elasticsearch/templates/poddisruptionbudget.yaml
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: "elasticsearch-master-pdb"
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: "elasticsearch-master"
---
# Source: elasticsearch/templates/service.yaml
kind: Service
apiVersion: v1
metadata:
  name: elasticsearch-master
  labels:
    heritage: "Tiller"
    release: "elasticsearch"
    chart: "elasticsearch-7.3.0"
    app: "elasticsearch-master"
  annotations:
    {}
    
spec:
  type: ClusterIP
  selector:
    heritage: "Tiller"
    release: "elasticsearch"
    chart: "elasticsearch-7.3.0"
    app: "elasticsearch-master"
  ports:
  - name: http
    protocol: TCP
    port: 9200
  - name: transport
    protocol: TCP
    port: 9300
---
# Source: elasticsearch/templates/service.yaml
kind: Service
apiVersion: v1
metadata:
  name: elasticsearch-master-headless
  labels:
    heritage: "Tiller"
    release: "elasticsearch"
    chart: "elasticsearch-7.3.0"
    app: "elasticsearch-master"
  annotations:
    service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
spec:
  clusterIP: None # This is needed for statefulset hostnames like elasticsearch-0 to resolve
  # Create endpoints also if the related pod isn't ready
  publishNotReadyAddresses: true
  selector:
    app: "elasticsearch-master"
  ports:
  - name: http
    port: 9200
  - name: transport
    port: 9300
---
# Source: elasticsearch/templates/statefulset.yaml
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: elasticsearch-master
  labels:
    heritage: "Tiller"
    release: "elasticsearch"
    chart: "elasticsearch-7.3.0"
    app: "elasticsearch-master"
  annotations:
    esMajorVersion: "7"
spec:
  serviceName: elasticsearch-master-headless
  selector:
    matchLabels:
      app: "elasticsearch-master"
  replicas: 3
  podManagementPolicy: Parallel
  updateStrategy:
    type: RollingUpdate
  volumeClaimTemplates:
  - metadata:
      name: elasticsearch-master
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 30Gi
      
  template:
    metadata:
      name: "elasticsearch-master"
      labels:
        heritage: "Tiller"
        release: "elasticsearch"
        chart: "elasticsearch-7.3.0"
        app: "elasticsearch-master"
      annotations:
        
    spec:
      securityContext:
        fsGroup: 1000
        
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - "elasticsearch-master"
            topologyKey: kubernetes.io/hostname
      terminationGracePeriodSeconds: 120
      volumes:
      initContainers:
      - name: configure-sysctl
        securityContext:
          runAsUser: 0
          privileged: true
        image: "docker.elastic.co/elasticsearch/elasticsearch:7.3.0"
        command: ["sysctl", "-w", "vm.max_map_count=262144"]
        resources:
          {}
          
      containers:
      - name: "elasticsearch"
        securityContext:
          capabilities:
            drop:
            - ALL
          runAsNonRoot: true
          runAsUser: 1000
          
        image: "docker.elastic.co/elasticsearch/elasticsearch:7.3.0"
        imagePullPolicy: "IfNotPresent"
        readinessProbe:
          failureThreshold: 3
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 3
          timeoutSeconds: 5
          
          exec:
            command:
              - sh
              - -c
              - |
                #!/usr/bin/env bash -e
                # If the node is starting up wait for the cluster to be ready (request params: 'wait_for_status=green&timeout=1s' )
                # Once it has started only check that the node itself is responding
                START_FILE=/tmp/.es_start_file

                http () {
                    local path="${1}"
                    if [ -n "${ELASTIC_USERNAME}" ] && [ -n "${ELASTIC_PASSWORD}" ]; then
                      BASIC_AUTH="-u ${ELASTIC_USERNAME}:${ELASTIC_PASSWORD}"
                    else
                      BASIC_AUTH=''
                    fi
                    curl -XGET -s -k --fail ${BASIC_AUTH} http://127.0.0.1:9200${path}
                }

                if [ -f "${START_FILE}" ]; then
                    echo 'Elasticsearch is already running, lets check the node is healthy'
                    http "/"
                else
                    echo 'Waiting for elasticsearch cluster to become cluster to be ready (request params: "wait_for_status=green&timeout=1s" )'
                    if http "/_cluster/health?wait_for_status=green&timeout=1s" ; then
                        touch ${START_FILE}
                        exit 0
                    else
                        echo 'Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )'
                        exit 1
                    fi
                fi
        ports:
        - name: http
          containerPort: 9200
        - name: transport
          containerPort: 9300
        resources:
          limits:
            cpu: 1000m
            memory: 1.5Gi
          requests:
            cpu: 100m
            memory: 1.5Gi
          
        env:
          - name: node.name
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: cluster.initial_master_nodes
            value: "elasticsearch-master-0,elasticsearch-master-1,elasticsearch-master-2,"
          - name: discovery.seed_hosts
            value: "elasticsearch-master-headless"
          - name: cluster.name
            value: "elasticsearch"
          - name: network.host
            value: "0.0.0.0"
          - name: ES_JAVA_OPTS
            value: "-Xmx1g -Xms1g"
          - name: node.data
            value: "true"
          - name: node.ingest
            value: "true"
          - name: node.master
            value: "true"
        volumeMounts:
          - name: "elasticsearch-master"
            mountPath: /usr/share/elasticsearch/data

And the ooutput of running kubectl get pods --namespace=elasticsearch -l app=elasticsearch-master

NAME                     READY     STATUS             RESTARTS   AGE
elasticsearch-master-0   0/1       CrashLoopBackOff   4          4m
elasticsearch-master-1   0/1       Running            0          4m
elasticsearch-master-2   0/1       Error              4          3m

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:2
  • Comments:16 (5 by maintainers)

github_iconTop GitHub Comments

4reactions
ashantykcommented, Feb 6, 2020

Found the problem: antiAffinity was set to hard and but I didn’t have enough nodes so only one pod was being scheduled… since that pod couldn’t go green (not finding enough masters) the deploy hanged.

0reactions
botelastic[bot]commented, Jun 5, 2020

This issue has been automatically closed because it has not had recent activity since being marked as stale.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Common problems | Elastic Cloud on Kubernetes [2.5]
Operator crashes on startup with OOMKilled edit. On very large Kubernetes clusters with many hundreds of resources (pods, secrets, config maps, and so...
Read more >
Elasticsearch pods go into CrashLoopBackOff error after ... - IBM
When upgrading CP4BA, it is observed that iaf-systemelasticsearc-es-data-* pods goes into CrashLoopBackOff error and is unable to restart on its own.
Read more >
why elasticsearch cluster in kubernetes start so slow
Make sure the volume mount works fine without delay. Check InitContainer completion time . InitContainer startup is sequential , so that can ...
Read more >
elasticsearch 19.5.5 · bitnami/bitnami - Artifact Hub
Elasticsearch is a distributed search and analytics engine. ... Spread Constraints for pod assignment spread across your cluster among failure-domains.
Read more >
Elasticsearch operator getting timed out while connecting to ...
Try to reach the Elasticsearch service through the Elasticsearch operator pod. If it did not work, check from the node level. ... Try...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found