AES need a lot of restart before it is going to running state.
See original GitHub issueDescribe the bug After the AES deployment on an Azure AKS cluster aks… Ready agent v1.13.12 4.15.0-1082-azure docker://3.0.10+azure
during the ambassador-… pods starting there is a python error in the logs and get 500 internal server error from the /diag/.
2020-05-07 09:36:28 diagd 1.4.2 [P219TThreadPoolExecutor-0_4] INFO: 7C716B56-786E-426C-B03C-96554F086837: 127.0.0.1 "GET /ambassador/v0/diag/" START 2020-05-07 09:36:28 diagd 1.4.2 [P219TThreadPoolExecutor-0_4] ERROR: 'NoneType' object has no attribute 'overview' Traceback (most recent call last): File "/usr/lib/python3.7/site-packages/ambassador-0.0.0.dev0-py3.7.egg/ambassador_diag/diagd.py", line 233, in wrapper result = f(*args, reqid=reqid, **kwds) File "/usr/lib/python3.7/site-packages/ambassador-0.0.0.dev0-py3.7.egg/ambassador_diag/diagd.py", line 519, in show_overview ov = diag.overview(request, app.estats) AttributeError: 'NoneType' object has no attribute 'overview' 2020-05-07 09:36:28 diagd 1.4.2 [P219TThreadPoolExecutor-0_4] ERROR: 7C716B56-786E-426C-B03C-96554F086837: 127.0.0.1 "GET /ambassador/v0/diag/" 1ms 500 server error time="2020-05-07 09:36:28" level=error msg="Bad HTTP response" func=github.com/datawire/apro/cmd/amb-sidecar/devportal/server.HTTPGet.func1 file="github.com/datawire/apro@/cmd/amb-sidecar/devportal/server/fetch er.go:165" status_code=500 subsystem=fetcher url="http://127.0.0.1:8877/ambassador/v0/diag/?json=true" time="2020-05-07 09:36:28" level=error msg="HTTP error 500 from http://127.0.0.1:8877/ambassador/v0/diag/?json=true" func=github.com/datawire/apro/cmd/amb-sidecar/devportal/server.HTTPGet file="github.com/dataw ire/apro@/cmd/amb-sidecar/devportal/server/fetcher.go:172" subsystem=fetcher url="http://127.0.0.1:8877/ambassador/v0/diag/?json=true" time="2020-05-07 09:36:28" level=info msg="HTTP error 500 from http://127.0.0.1:8877/ambassador/v0/diag/?json=true" func="github.com/datawire/apro/cmd/amb-sidecar/devportal/server.(*fetcher)._retrieve" file="gi thub.com/datawire/apro@/cmd/amb-sidecar/devportal/server/fetcher.go:195"
To Reproduce Steps to reproduce the behavior:
- Just following this documentation: https://www.getambassador.io/docs/latest/topics/install/
Expected behavior A running, stable deployment, and pods which starting ~1 min.
Versions (please complete the following information):
- Ambassador: [1.3.1, 1.4.1, 1.4.2]
- Kubernetes environment [AKS]
- Version [1.13.12, 1.16.7]
Additional context The problem is persist, not helpful the “delete the pod”, persist if I scale up and down the pods. Every new pod restart needs 7-8-10 restart before reach the running state.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:10
Seems this configuration for liveness and readiness probe was helpful for us
initialDelaySeconds: 90 periodSeconds: 60 timeoutSeconds: 15 failureThreshold: 10 successThreshold: 1
After these settings the 500 error still exists, but no restart and the pod starting around ~5min
ambassador-7788d44cd7-7ndt2 1/1 Running 0 5m16s ambassador-86dbc79c74-64bmk 1/1 Running 0 4m2s
Issue is exist in the version 1.5.2 too.