Observing "no healthy upstream" for new deployments until ambassador pods restarted
See original GitHub issueDescription of the problem I am facing a very strange problem. Our IT wants us to migrate Application testing pipeline to a new cluster. After deploying ambassador with helm (originally it was 1.12.0) I tested the deployments of our applications: all the deployments were successful, however on access to the application I constantly got an error “no healthy upstream” (the same deployment works in the old cluster).
At some point in time I learned about released 1.12.1 and upgraded the ambassador with “helm upgrade” to 1.12.1. After that all the old not working application deployments started to work without any additional changes. But every new deployment had the same issue: the error “no healthy upstream”. Eventually ambassador was upgraded to 1.12.2 with the same effect: not working the old deployments started to work without any changes and every new deployment had an error “no healthy upstream”.
Investigation of connectivity confirmed that the application is accessible with curl from ambassador pod via connection to the app service, as well as to the app in pod directly. However, external requests to the application always ended up with “no healthy upstream”.
Now, if the ambassador pod is killed (replica count was reduced to 1 for simplifying logs analysis) and the deployment/replicaset replaces it with a new pod the issue is resolved - all not working deployments start working (it was tested 3 times).
Details on the current deployment:
$ helm -n ambassador list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
ambassador ambassador 14 2021-03-31 10:20:18.8370383 -0400 EDT deployed ambassador-6.6.2 1.12.2
Is it something that I might be missing during the deployment of ambassador?nd concise description of what the bug is.
Expected behavior All the new application deployments start working without a need to restart ambassador pods
Versions:
- Ambassador: 1.12.2 (1.12.0, 1.12.1)
- Kubernetes environment: Azure Kubernetes Service (AKS) - privatelink custer (i,e. no access from the public internet and only internal LBs - Annotation “service.beta.kubernetes.io/azure-load-balancer-internal” is set to “true”)
- Version: v1.18.14
Additional context None. I am not sure if it is a bug or not. I would appreciate any workaround for our environment.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:15 (4 by maintainers)
Top GitHub Comments
No actually it’s a different issue. Upstream Services get disconnected for no clear reason! And we get “no healthy upstream” error. This happens after a few hours from last deployment in the cluster. If we make a deployment in the cluster, the error disappears.
@wissam-launchtrip can you go into a bit more detail? Are you seeing this exact issue or something similar? Anything that can help us verify the report and reproduce the issue for a possible fix 👍