question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

failure to autoscale unless workers are already present

See original GitHub issue

I am testing the PBSCluster along with autoscaling. It seems that I am unable to get the cluster to launch any workers without explicitly starting at least one worker. I would expect that this configuration would scale from 0 to 10 (180 processes) without further interaction/configuration.

    cluster = PBSCluster(queue='default',
                         walltime='01:00:00',
                         project='MyAccount',
                         resource_spec='1:ncpus=36:mpiprocs=36:mem=109GB',
                         interface='ib0',
                         threads=4,
                         processes=18)
    client = Client(cluster)
    cluster.adapt(minimum=0, maximum=10)

@mrocklin - this may actually be a problem with the dask adaptive cluster but I wanted to discuss here to see if I am missing something obvious specific to PBS.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
mrocklincommented, Mar 29, 2018

Can you report the contents of cluster._adaptive.log ?

On Thu, Mar 29, 2018 at 3:35 PM, Joe Hamman notifications@github.com wrote:

I am testing the PBSCluster along with autoscaling. It seems that I am unable to get the cluster to launch any workers without explicitly starting at least one worker. I would expect that this configuration would scale from 0 to 10 (180 processes) without further interaction/configuration.

cluster = PBSCluster(queue='default',
                     walltime='01:00:00',
                     project='MyAccount',
                     resource_spec='1:ncpus=36:mpiprocs=36:mem=109GB',
                     interface='ib0',
                     threads=4,
                     processes=18)
client = Client(cluster)
cluster.adapt(minimum=0, maximum=10)

@mrocklin https://github.com/mrocklin - this may actually be a problem with the dask adaptive cluster but I wanted to discuss here to see if I am missing something obvious specific to PBS.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-jobqueue/issues/26, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszP7j_X4s7IepLV44BwOnbXqxShn6ks5tjTeVgaJpZM4TA3ut .

0reactions
jhammancommented, Jul 16, 2018

closed via #63

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot Amazon EC2 Auto Scaling issues
An EC2 instance in an Amazon EC2 Auto Scaling group reboots during a deployment. Your deployment can fail if an EC2 instance is...
Read more >
Autoscaling in Kubernetes: Why doesn't the Horizontal Pod ...
I'm sure all of this seems simple enough that you're now wondering what could possibly go wrong for the HPA to not work....
Read more >
Troubleshooting Azure Monitor autoscale - Microsoft Learn
Review the autoscale metrics if you are using a metric-based scale rule. It's possible that the Observed metric value or Observed Capacity are ......
Read more >
Autoscaling clusters | Dataproc Documentation - Google Cloud
If there are pending containers, autoscaling may add workers to the cluster. You can view these metrics in Cloud Monitoring. As a default,...
Read more >
Azure autoscaling is not working - Stack Overflow
If there was a failure auto-scaling your service, you would be able to get ... of 1% you suddenly have 99% for 15...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found