question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Feature Request] Fallback queues

See original GitHub issue

Hi,

Often I queue an experiment in a queue that uses on-demand GPU instances in aws and the clearml aws autoscaler keeps failing with the following error:

Error: Failed to start new instance, An error occurred (InsufficientInstanceCapacity) when calling the RunInstances operation (reached max retries: 4): We currently do not have sufficient g4dn.2xlarge capacity in the Availability Zone you requested (eu-west-1a). Our system will be working on provisioning additional capacity. You can currently get g4dn.2xlarge capacity by not specifying an Availability Zone in your request or choosing eu-west-1b, eu-west-1c

I wonder if there is an easy way of extending the aws autoscaler to detect such errors of InsufficientInstanceCapacity and use a different availability zone. Given that this would mean that some other aws properties (eg. subnet, security groups, etc.) should be different, we could think of having a “fallback to queue” mechanism in the aws autoscaler. This mechism would work as follows:

  1. In the aws autoscaler configuration, I specify for a specific queue the fallback queues
  2. If the autoscaler fails to spin up an instance in that specific queue, it will try to start another instance in one of the fallback queue

In practice, this would allow to have one queue configuration per availability zone. The autoscaler could then spin up an instance faster.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
H4dr1encommented, Nov 21, 2021

yes, this is exactly what I have in mind 👍

0reactions
tienduccaocommented, Jan 8, 2022

Great news, thanks Martin

On Sat, 8 Jan 2022, 02:31 Martin.B, @.***> wrote:

Thanks for the ping @tienduccao https://github.com/tienduccao ! Things were delayed a bit, but I can update that the GCP is ready and will be released to the community (SaaS) version and then sync back to repository. I’m hoping it will not take more than a couple of weeks 😃

— Reply to this email directly, view it on GitHub https://github.com/allegroai/clearml/issues/493#issuecomment-1007858227, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBNCU4WTEQ3J2UCQZBIABTUU6HXRANCNFSM5IJ57ZTA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

Read more comments on GitHub >

github_iconTop Results From Across the Web

Workstream fallback queues ensure ... - Microsoft Cloud Blogs
Create a fallback queue for each workstream with enhancements to unified routing in Dynamics 365 Customer Service.
Read more >
Chat remains in the fallback queue indefinitely when none of ...
When the agent is not available, the chat is sent to a fallback queue. ... Initiate a chat from here---> the chat request...
Read more >
Smart Escalations | Fallback rules for call routing - Freshworks
Choose to customize your call queues with the fallback rules the moment you see a surge in the number of missed calls, voicemails,...
Read more >
Feature Request: Sorting the Download Queue :: Suggestions / Ideas
Feature Request : Sorting the Download Queue ... Download in most-recently-played order (fallback to requested order if there is no play history)
Read more >
Fallback to synchronous if queue isn't available. #2855 - GitHub
Fallback to synchronous if queue isn't available. #2855. Closed ... Labels. Issue Type: Feature Request. Comments. @brianbister. Copy link ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found