question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[elasticsearch] New readiness probes causing full cluster-restart

See original GitHub issue

Chart version: 7.7.0

Kubernetes version: 1.17.1

Kubernetes provider: On-prem

Helm Version: 3.2.0

helm get release output

Output of helm get release
USER VALUES

esConfig:
  elasticsearch.yml: |
    action.auto_create_index: "-hd-*"
esJavaOpts: -Xmx5g -Xms5g
esMajorVersion: 6
image: "private-image-based-on-official-oss:6.4.2"
imagePullPolicy: IfNotPresent
imagePullSecrets:
- name: some-credentials
imageTag: 6.4.2
ingress:
  annotations:
    ingress.kubernetes.io/ssl-redirect: "true"
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/auth-realm: Authentication Required
    nginx.ingress.kubernetes.io/auth-secret: dev-elasticsearch-ingress-auth
    nginx.ingress.kubernetes.io/auth-type: basic
    nginx.ingress.kubernetes.io/proxy-body-size: 60m
  enabled: true
  hosts:
  - ""
  path: /
  tls:
  - hosts:
    - ""
lifecycle:
  postStart:
    exec:
      command:
      - bash
      - -c
      - |
        #!/bin/bash
        cd /usr/share/elasticsearch/plugins/xxx
        /opt/jdk-10.0.2/bin/jar -cf config.jar config.cfg
        chmod 777 config.jar
persistence:
  enabled: true
podSecurityPolicy:
  create: false
rbac:
  create: true
resources:
  limits:
    cpu: 2000m
    memory: 8Gi
  requests:
    cpu: 200m
    memory: 8Gi
sysctlInitContainer:
  enabled: false
volumeClaimTemplate:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  storageClassName: rook-ceph-cephfs

Describe the bug: I updated the chart to the newest version 7.7.0 and expected that the three-elastic nodes are updated one after another, waiting until the cluster is green again. (The most recent restarted pod was not ready, until the cluster was green again in the past). Now, the pod became ready after a few minutes and kubernetes moved on too quickly, so the cluster was red and not down.

Steps to reproduce:

  1. Update the chart-installation to 7.7.0

Expected behavior: The readiness probe is working as expected, and mark the pod as not ready, until the cluster is green again.

Any additional context: I did the update of my release often in the past without such problems, but always today with the new version.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10 (10 by maintainers)

github_iconTop GitHub Comments

2reactions
fatmcgavcommented, May 28, 2020

@ckotzbauer The fix in #638 has been merged, and back-ported to the 6.8 and 7.7 branches for inclusion on the next minor release.

1reaction
ckotzbauercommented, May 20, 2020

Thank you so much for digging in @fatmcgav. I really appreciate! This sounds promising. I will pin my chart to the 6.8.x version and wait for the patch-release. Thanks again, for your work!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Full-cluster restart and rolling restart | Elasticsearch Guide [8.5]
Temporarily stop the tasks associated with active machine learning jobs and datafeeds. (Optional) · Shut down all nodes. · Perform any needed changes....
Read more >
Readiness and Liveness probes for elasticsearch 6.3.0 on ...
This is causing the pod to be declared unhealthy and eventually gets restarted which appears to be a false restart. Warning Unhealthy 15s ......
Read more >
Logging OpenShift Container Platform 4.7
Before this update, if you redeployed a full Elasticsearch cluster, ... NAME READY STATUS RESTARTS AGE cluster-logging-operator-66f77ffccb-ppzbg 1/1 Running ...
Read more >
Disruptions | Kubernetes
updating a deployment's pod template causing a restart; directly deleting a pod (e.g. by accident). Cluster administrator actions include:.
Read more >
Adding health checks with Liveness, Readiness, and Startup ...
In this post I discuss the health check probes in Kubernetes, ... your pod running in the cluster by starting up new instances...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found