question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

User node group will not scale to zero nodes on Azure

See original GitHub issue

OS system and architecture in which you are running QHub

Mac M1

Expected behavior

When min_nodes: 0 in the user node group I expect the node group to scale to zero nodes when there is no active user.

Actual behavior

When there is no active user notebook, the user node group stays up with one node, and these pods running:

NAMESPACE      NAME                                   PF   READY     RESTARTS STATUS         CPU    MEM    %CPU/R    %CPU/L    %MEM/R    %MEM/L IP             NODE                             AGE
dev            qhub-prometheus-node-exporter-7dj6b    ●    1/1              0 Running          4     20       n/a       n/a       n/a       n/a 10.224.0.5     aks-user-41209954-vmss000003     38h
dev            user-scheduler-8f67c547d-gkxfv         ●    1/1              0 Running          2     21       n/a       n/a       n/a       n/a 10.244.9.2     aks-user-41209954-vmss000003     39h
dev            user-scheduler-8f67c547d-qjbt6         ●    1/1              0 Running          3     21       n/a       n/a       n/a       n/a 10.244.9.3     aks-user-41209954-vmss000003     39h
kube-system    azure-ip-masq-agent-v4zjf              ●    1/1              0 Running          1     16         1         0        32         6 10.224.0.5     aks-user-41209954-vmss000003     38h
kube-system    cloud-node-manager-58cx8               ●    1/1              0 Running          1     20         2         0        41         4 10.224.0.5     aks-user-41209954-vmss000003     38h
kube-system    csi-azuredisk-node-vv9zm               ●    3/3              0 Running          3     49        10       n/a        82        12 10.224.0.5     aks-user-41209954-vmss000003     38h
kube-system    csi-azurefile-node-26nxp               ●    3/3              0 Running          3     44        10       n/a        73         7 10.224.0.5     aks-user-41209954-vmss000003     38h
kube-system    kube-proxy-898w6                       ●    1/1              0 Running          1     25         1       n/a       n/a       n/a 10.224.0.5     aks-user-41209954-vmss000003     38h

How to Reproduce the problem?

Create a Qhub on Azure with min_nodes: 0 in the user node group.

Command output

No response

Versions and dependencies used.

Currently using my fork of Qhub which upgrades the azurerm to 3.22 so that Qhub will work on the Mac M1. See #1430.

$ conda --version
conda 4.14.0
$ kubectl versin
Client Version: v1.25.0
Kustomize Version: v4.5.7
Server Version: v1.23.5
$ qhub --version
0.5.0.dev4+g9de633c

Compute environment

Azure

Integrations

No response

Anything else?

No response

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:10 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
alimanfoocommented, Dec 5, 2022

In case anyone else stumbles on this also working on GCP, I also found that some kube-system pods were blocking scale down of user and/or worker nodes, particularly kube-dns and metrics-server pods. I seem to have been able to overcome this by doing three things:

  • Set the cluster’s auto-scaling profile to “optimise utilisation” instead of the default “balanced” (makes scale-down happen faster).
  • Change the kube-dns-autoscaler config map as described here.
  • Get all system services to run on a specific node pool (i.e., not on user or worker nodes) as described here (e.g., use the “general” node pool).
1reaction
tjcronecommented, Oct 17, 2022

Thank you very much for looking into this and finding this result! I agree this appears to be expected behavior considering the need for the user-scheduler pod. This will be a great help regarding our prepurchase plan. Cheers!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Scale an Azure Kubernetes Service (AKS) cluster
To scale a user pool to 0, you can use the az aks nodepool scale in alternative to the above az aks scale...
Read more >
Setting auto-scaler to min-count 0 will not scale up windows ...
What you expected to happen: node count should be zero if no deployments; deployment should trigger a scale up of windows nodes 0...
Read more >
How to request scale up from 0 to X the number of nodes in a ...
I have a kubernetes cluster (v1.24.3) running in Azure with 3 nodepools called small, standard and large. For each of these nodepools I...
Read more >
Support zero node clusters · Community - Azure Feedback
Only Azure does not allow zero node clusters. ... /master/cluster-autoscaler/FAQ.md#how-can-i-scale-a-node-group-to-0GKEhttps://cloud.google.com/kubernetes- ...
Read more >
Autoscaling - Amazon EKS - AWS Documentation
Nodes that are found within a single node group might share several common properties such as ... ensures that there are no problems...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found