Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Readiness probe health check parameters are incorrect

See original GitHub issue

Chart version: 7.4.1

Kubernetes version: 1.14, GitVersion:“v1.14.6-eks-5047ed”

Kubernetes provider: EKS

Helm Version: v2.14.2

Describe the bug: Default clusterHealthCheckParams is not suitable for any production helm deployment.

Steps to reproduce:

Use this elasticsearch chart in the default configuration.

clusterHealthCheckParams: "wait_for_status=green&timeout=1s"

Perform an upgrade, indice restore or anything that causes the cluster to stop returning green status
Any helm operations after this will take the cluster offline, when using helm --recreate-pods this is even more immediate and it will never recover. Any failureover scenarios (e.g kube node replacements) will also immediately break the release.

I will comment on this issue with a suitable set of replacements but a combination of liveliness & readiness with _cluster/health?local=true&wait_for_status=[yellow,green]&timeout=1s should do it.

I think the timeout is too low to always return but I might be wrong.

I’ll look stable/elasticsearch chart as I think they mostly resolved these issues, it took a while though.

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:13 (4 by maintainers)

Top GitHub Comments

1reaction

fatmcgavcommented, Nov 18, 2019

@lachlanbb Thank you for opening an issue, and apologies if this is causing you pain.

The params used by the readinessProbe Curl request is override-able by setting the clusterHealthCheckParams value: https://github.com/elastic/helm-charts/blob/0411010e8f03c25c06d864ce62f5b482b5439bdb/elasticsearch/values.yaml#L198

So you should be able to set this param to wait_for_status=yellow&timeout=1s - Note that wait_for_status is a singular value:

wait_for_status (Optional, string) One of green, yellow or red. Will wait (until the timeout provided) until the status of the cluster changes to the one provided or better, i.e. green > yellow > red. By default, will not wait for any status.

Are you able to give this a try and report back?

1reaction

holmscommented, Nov 18, 2019

I’ve increased readinessProbe timeout to 200s 😄 Then it worked. In managed kubernetes this is somehow an issue too.

Top Results From Across the Web

Configure Liveness, Readiness and Startup Probes

This page shows how to configure liveness, readiness and startup probes for containers. The kubelet uses liveness probes to know when to restart ......

Kubernetes Liveness and Readiness Probes: How to Avoid ...

In this article, I will explore how to avoid making service reliability worse when implementing Kubernetes liveness and readiness probes.

Kubernetes Readiness Probes | Practical Guide - Komodor

A readiness probe indicates whether applications running in a container are ready to receive traffic. If so, Services in Kubernetes can send traffic...

How to Troubleshoot and Address Liveness / Readiness ...

Liveness / Readiness probe failure are caused by Jenkins being not responsive to a health check - currently done https://$POD_IP:8080/$MASTER_NAME/login.

Kubernetes : Configure Liveness and Readiness Probes

Restarting a container with a failing readiness probe will not fix it, so readiness failures receive no automatic reaction from Kubernetes.