User node group will not scale to zero nodes on Azure
See original GitHub issueOS system and architecture in which you are running QHub
Mac M1
Expected behavior
When min_nodes: 0
in the user node group I expect the node group to scale to zero nodes when there is no active user.
Actual behavior
When there is no active user notebook, the user node group stays up with one node, and these pods running:
NAMESPACE NAME PF READY RESTARTS STATUS CPU MEM %CPU/R %CPU/L %MEM/R %MEM/L IP NODE AGE
dev qhub-prometheus-node-exporter-7dj6b ● 1/1 0 Running 4 20 n/a n/a n/a n/a 10.224.0.5 aks-user-41209954-vmss000003 38h
dev user-scheduler-8f67c547d-gkxfv ● 1/1 0 Running 2 21 n/a n/a n/a n/a 10.244.9.2 aks-user-41209954-vmss000003 39h
dev user-scheduler-8f67c547d-qjbt6 ● 1/1 0 Running 3 21 n/a n/a n/a n/a 10.244.9.3 aks-user-41209954-vmss000003 39h
kube-system azure-ip-masq-agent-v4zjf ● 1/1 0 Running 1 16 1 0 32 6 10.224.0.5 aks-user-41209954-vmss000003 38h
kube-system cloud-node-manager-58cx8 ● 1/1 0 Running 1 20 2 0 41 4 10.224.0.5 aks-user-41209954-vmss000003 38h
kube-system csi-azuredisk-node-vv9zm ● 3/3 0 Running 3 49 10 n/a 82 12 10.224.0.5 aks-user-41209954-vmss000003 38h
kube-system csi-azurefile-node-26nxp ● 3/3 0 Running 3 44 10 n/a 73 7 10.224.0.5 aks-user-41209954-vmss000003 38h
kube-system kube-proxy-898w6 ● 1/1 0 Running 1 25 1 n/a n/a n/a 10.224.0.5 aks-user-41209954-vmss000003 38h
How to Reproduce the problem?
Create a Qhub on Azure with min_nodes: 0
in the user node group.
Command output
No response
Versions and dependencies used.
Currently using my fork of Qhub which upgrades the azurerm to 3.22 so that Qhub will work on the Mac M1. See #1430.
$ conda --version
conda 4.14.0
$ kubectl versin
Client Version: v1.25.0
Kustomize Version: v4.5.7
Server Version: v1.23.5
$ qhub --version
0.5.0.dev4+g9de633c
Compute environment
Azure
Integrations
No response
Anything else?
No response
Issue Analytics
- State:
- Created a year ago
- Comments:10 (10 by maintainers)
Top Results From Across the Web
Scale an Azure Kubernetes Service (AKS) cluster
To scale a user pool to 0, you can use the az aks nodepool scale in alternative to the above az aks scale...
Read more >Setting auto-scaler to min-count 0 will not scale up windows ...
What you expected to happen: node count should be zero if no deployments; deployment should trigger a scale up of windows nodes 0...
Read more >How to request scale up from 0 to X the number of nodes in a ...
I have a kubernetes cluster (v1.24.3) running in Azure with 3 nodepools called small, standard and large. For each of these nodepools I...
Read more >Support zero node clusters · Community - Azure Feedback
Only Azure does not allow zero node clusters. ... /master/cluster-autoscaler/FAQ.md#how-can-i-scale-a-node-group-to-0GKEhttps://cloud.google.com/kubernetes- ...
Read more >Autoscaling - Amazon EKS - AWS Documentation
Nodes that are found within a single node group might share several common properties such as ... ensures that there are no problems...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
In case anyone else stumbles on this also working on GCP, I also found that some kube-system pods were blocking scale down of user and/or worker nodes, particularly kube-dns and metrics-server pods. I seem to have been able to overcome this by doing three things:
Thank you very much for looking into this and finding this result! I agree this appears to be expected behavior considering the need for the user-scheduler pod. This will be a great help regarding our prepurchase plan. Cheers!