Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

User node group will not scale to zero nodes on Azure

See original GitHub issue

OS system and architecture in which you are running QHub

Mac M1

Expected behavior

When min_nodes: 0 in the user node group I expect the node group to scale to zero nodes when there is no active user.

Actual behavior

When there is no active user notebook, the user node group stays up with one node, and these pods running:

NAMESPACE      NAME                                   PF   READY     RESTARTS STATUS         CPU    MEM    %CPU/R    %CPU/L    %MEM/R    %MEM/L IP             NODE                             AGE
dev            qhub-prometheus-node-exporter-7dj6b    ●    1/1              0 Running          4     20       n/a       n/a       n/a       n/a 10.224.0.5     aks-user-41209954-vmss000003     38h
dev            user-scheduler-8f67c547d-gkxfv         ●    1/1              0 Running          2     21       n/a       n/a       n/a       n/a 10.244.9.2     aks-user-41209954-vmss000003     39h
dev            user-scheduler-8f67c547d-qjbt6         ●    1/1              0 Running          3     21       n/a       n/a       n/a       n/a 10.244.9.3     aks-user-41209954-vmss000003     39h
kube-system    azure-ip-masq-agent-v4zjf              ●    1/1              0 Running          1     16         1         0        32         6 10.224.0.5     aks-user-41209954-vmss000003     38h
kube-system    cloud-node-manager-58cx8               ●    1/1              0 Running          1     20         2         0        41         4 10.224.0.5     aks-user-41209954-vmss000003     38h
kube-system    csi-azuredisk-node-vv9zm               ●    3/3              0 Running          3     49        10       n/a        82        12 10.224.0.5     aks-user-41209954-vmss000003     38h
kube-system    csi-azurefile-node-26nxp               ●    3/3              0 Running          3     44        10       n/a        73         7 10.224.0.5     aks-user-41209954-vmss000003     38h
kube-system    kube-proxy-898w6                       ●    1/1              0 Running          1     25         1       n/a       n/a       n/a 10.224.0.5     aks-user-41209954-vmss000003     38h

How to Reproduce the problem?

Create a Qhub on Azure with min_nodes: 0 in the user node group.

Command output

No response

Versions and dependencies used.

Currently using my fork of Qhub which upgrades the azurerm to 3.22 so that Qhub will work on the Mac M1. See #1430.

$ conda --version
conda 4.14.0
$ kubectl versin
Client Version: v1.25.0
Kustomize Version: v4.5.7
Server Version: v1.23.5
$ qhub --version
0.5.0.dev4+g9de633c

Compute environment

Azure

Integrations

No response

Anything else?

No response

Issue Analytics

State:
Created a year ago
Comments:10 (10 by maintainers)

Top GitHub Comments

1reaction

alimanfoocommented, Dec 5, 2022

In case anyone else stumbles on this also working on GCP, I also found that some kube-system pods were blocking scale down of user and/or worker nodes, particularly kube-dns and metrics-server pods. I seem to have been able to overcome this by doing three things:

Set the cluster’s auto-scaling profile to “optimise utilisation” instead of the default “balanced” (makes scale-down happen faster).
Change the kube-dns-autoscaler config map as described here.
Get all system services to run on a specific node pool (i.e., not on user or worker nodes) as described here (e.g., use the “general” node pool).

1reaction

tjcronecommented, Oct 17, 2022

Thank you very much for looking into this and finding this result! I agree this appears to be expected behavior considering the need for the user-scheduler pod. This will be a great help regarding our prepurchase plan. Cheers!

Top Results From Across the Web

Scale an Azure Kubernetes Service (AKS) cluster

To scale a user pool to 0, you can use the az aks nodepool scale in alternative to the above az aks scale...

Setting auto-scaler to min-count 0 will not scale up windows ...

What you expected to happen: node count should be zero if no deployments; deployment should trigger a scale up of windows nodes 0...

How to request scale up from 0 to X the number of nodes in a ...

I have a kubernetes cluster (v1.24.3) running in Azure with 3 nodepools called small, standard and large. For each of these nodepools I...

Support zero node clusters · Community - Azure Feedback

Only Azure does not allow zero node clusters. ... /master/cluster-autoscaler/FAQ.md#how-can-i-scale-a-node-group-to-0GKEhttps://cloud.google.com/kubernetes- ...

Autoscaling - Amazon EKS - AWS Documentation

Nodes that are found within a single node group might share several common properties such as ... ensures that there are no problems...