Liveness probe performance issue
See original GitHub issueDescription of the issue
I deployed a ERPNext Helm chart on in-house K8S cluster. Recently, after upgrade to newer chart (from 1.0.0 to 1.0.14), I’ve been noticing significant, constant CPU usage. It turned out that it is caused by liveness probe execution for scheduler and all workers, ie. doctor.py
script. It’s not that critical when the server is idle, but during heavy load it makes a significant performance impact.
First point is that exactly the same script is executed for multiple containers and doesn’t really check if a given container is live, but if all backend services are live. Another is that it’s a Python script, which by itself isn’t particularly optimal. Ie. every 5s 4 instances of docker-endpoint.sh
are executed, which execute su
(what spams in sys log as well), initiate Python env, load Python interpreter with all the required libraries and execute a script, which, in my understanding after briefly reading the code, basically does a TCP liveness check.
Context information (for bug reports)
Kubernetes cluster (v1.18.8) deployed on dedicated server on Fedora 33 with erpnext-nginx in version v12.10.1, erpnext-worker v12.10.1, socketio v12.8.4, helm chart v1.0.14.
Steps to reproduce the issue
- Deploy ERPNext Helm chart.
- Watch CPU usage as
doctor.py
is being executed every 5s for scheduler and all 3 workers.
Observed result
High CPU usage every 5s on a task that should be as optimised as possible (ie. liveness probe).
Expected result
No significant CPU load during liveness check.
Stacktrace / full error message if available
Not relevant.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:16
Top GitHub Comments
no netcat used in frappe-nginx/docker-entrypoint.sh
concerns:
@MarekPikula Thanks this is great from the learning and debugging point of view. I am new to kubernetes and frappe helm so started learning liveness probe and readinessprobe with regards to understand frappe helm charts after you posted this bug. Hope we have solution so the process of learning continues!