Readiness probe health check parameters are incorrect
See original GitHub issueChart version: 7.4.1
Kubernetes version: 1.14, GitVersion:“v1.14.6-eks-5047ed”
Kubernetes provider: EKS
Helm Version: v2.14.2
Describe the bug: Default clusterHealthCheckParams is not suitable for any production helm deployment.
Steps to reproduce:
- Use this elasticsearch chart in the default configuration.
clusterHealthCheckParams: "wait_for_status=green&timeout=1s"
- Perform an upgrade, indice restore or anything that causes the cluster to stop returning green status
- Any helm operations after this will take the cluster offline, when using helm --recreate-pods this is even more immediate and it will never recover. Any failureover scenarios (e.g kube node replacements) will also immediately break the release.
I will comment on this issue with a suitable set of replacements but a combination of liveliness & readiness with _cluster/health?local=true&wait_for_status=[yellow,green]&timeout=1s
should do it.
I think the timeout is too low to always return but I might be wrong.
I’ll look stable/elasticsearch chart as I think they mostly resolved these issues, it took a while though.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:13 (4 by maintainers)
Top Results From Across the Web
Configure Liveness, Readiness and Startup Probes
This page shows how to configure liveness, readiness and startup probes for containers. The kubelet uses liveness probes to know when to restart ......
Read more >Kubernetes Liveness and Readiness Probes: How to Avoid ...
In this article, I will explore how to avoid making service reliability worse when implementing Kubernetes liveness and readiness probes.
Read more >Kubernetes Readiness Probes | Practical Guide - Komodor
A readiness probe indicates whether applications running in a container are ready to receive traffic. If so, Services in Kubernetes can send traffic...
Read more >How to Troubleshoot and Address Liveness / Readiness ...
Liveness / Readiness probe failure are caused by Jenkins being not responsive to a health check - currently done https://$POD_IP:8080/$MASTER_NAME/login.
Read more >Kubernetes : Configure Liveness and Readiness Probes
Restarting a container with a failing readiness probe will not fix it, so readiness failures receive no automatic reaction from Kubernetes.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@lachlanbb Thank you for opening an issue, and apologies if this is causing you pain.
The params used by the
readinessProbe
Curl request is override-able by setting theclusterHealthCheckParams
value: https://github.com/elastic/helm-charts/blob/0411010e8f03c25c06d864ce62f5b482b5439bdb/elasticsearch/values.yaml#L198So you should be able to set this param to
wait_for_status=yellow&timeout=1s
- Note thatwait_for_status
is a singular value:Are you able to give this a try and report back?
I’ve increased readinessProbe timeout to 200s 😄 Then it worked. In managed kubernetes this is somehow an issue too.