question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support adaptive scaling in Helm cluster

See original GitHub issue

As I understand, the different between KubeCluster and HelmCluster is

  1. KubeCluster start scheduler at where the client runs, the worker resources come from Kubernetes.
  2. HelmCluster has a long running scheduler pod in Kubernetes cluster.

My requirement is, I hope there’s a long running scheduler in the cluster and multiple clients can connects this scheduler to submit tasks, the worker resources can come from same kubernetes cluster as scheduler and they can be scale up and down based on the load like what KubeCluster provides.

Seems it’s a combination of KubeCluster and HelmCluster. Did community consider this case when we add Kubernetes support? Is there any technical blockers? If that’s something reasonable, I can help work on this feature request

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:1
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
jacobtomlinsoncommented, Jan 8, 2021

Sure I understand that. This is probably the kind of question that would be asked on a forum, which we have discussed creating in the past.

Also, you should let people sponsor you on Github!

That’s very kind of you, but I’ll leave my employer to do the sponsoring 😄.

If you want to give back to the Dask community you can donate via NumFocus.

1reaction
omarsumadicommented, Jan 7, 2021

Hi @jacobtomlinson - I wanted to piggyback off of this exact question to perhaps add some clarity towards people who are looking for Dask as a small-business solution to schedule workflows. By the way, thanks for everything you have done - we need more people like you.

I am at a crossroads for my small business to deploy Dask as a way for our projected ~10 analysts to execute long-running Python computations. Here’s the workflow that I run:

  • Someone submits their code through our Admin interface
  • That code is sent to our Django Webserver pod running inside of Kubernetes
  • Code is to be processed, depending on what the user specifies, by either threads or processes depending on if the GIL is released (such as a Dask-DF operation)
  • The Number of Workers is known beforehand (our analysts have to specify how many processes/threads they want)

My Attempts: I initially have three ways towards setting up our infrastructure:

  1. Launch the Dask-Helm chart and enable Horizontal Autoscaling by setting a metric to scale off of CPU as shown in articles like these: https://levelup.gitconnected.com/deploy-and-scale-your-dask-cluster-with-kubernetes-9e43f7e24b04
  1. Launch the Dask-Helm chart and use my database to keep a count of how many workers I need and how many workers are active (so a Database Push before and After each Dask Process) and manually scale that way using client.cluster.scale(). Problem is, workers are again not terminated gracefully and a running task could be terminated instead.
  2. Using Dask-Kubernetes as you’ve outlined in this post and as I try and see if its right for us below.

The Actual Question: I was wondering if this was the right way to do it, starting from where I left off using KubeCluster:

  • Code is sent to my Django Webserver inside of a Kubernetes pod
  • Create a new KubeCluster using a worker-spec for that specific task, and in that case I can define whether I want larger workers for more threads or small workers for more processes, using something like this:
pod_spec = make_pod_spec(image='daskdev/dask:latest',
                          memory_limit='4G', memory_request='4G',
                          cpu_limit=1, cpu_request=1,
                          env={'EXTRA_PIP_PACKAGES': 'fastparquet git+https://github.com/dask/distributed'})
cluster = KubeCluster(pod_spec)
cluster.scale(10)
  • Scale the Kube Cluster to how much resources was defined by our analyst.
  • Let Google Kubernetes Engine handle scaling nodes to create space for the Kube Cluster
  • Close the Kube Cluster by calling cluster.close() and client.close() when that task is done.
  • Therefore, we don’t handle scaling to Kubernetes, but keep it all within Dask.

Will spread the love if this is answered and I’ve understood that the last implementation as I outlined is the way to go! If I wrote something confusing, I’ll be more than happy to correct myself.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Horizontal Pod Autoscaling | Kubernetes
The HorizontalPodAutoscaler controller accesses corresponding workload resources that support scaling (such as Deployments and StatefulSet).
Read more >
Scaling Kubernetes to Zero (And Back) - Linode
Optimize your K8s cluster resources by scaling Kubernetes to zero, and quickly increase replicas when traffic surges.
Read more >
Source code for dask_kubernetes.helm.helmcluster
[docs]class HelmCluster(Cluster): """Connect to a Dask cluster deployed ... Enabling you to perform basic cluster actions such as scaling and log retrieval.
Read more >
Product scaling - Atlassian DC Helm Charts
The Helm charts provision one StatefulSet by default. The number of replicas within this StatefulSet can be altered either declaratively or imperatively. Note ......
Read more >
NGINX Tutorial: Reduce Kubernetes Latency with Autoscaling
Reduce Kubernetes Latency with Autoscaling (this post) ... The fastest way to install NGINX Ingress Controller is with Helm.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found