[Feature Request] Fallback queues
See original GitHub issueHi,
Often I queue an experiment in a queue that uses on-demand GPU instances in aws and the clearml aws autoscaler keeps failing with the following error:
Error: Failed to start new instance, An error occurred (InsufficientInstanceCapacity) when calling the RunInstances operation (reached max retries: 4): We currently do not have sufficient g4dn.2xlarge capacity in the Availability Zone you requested (eu-west-1a). Our system will be working on provisioning additional capacity. You can currently get g4dn.2xlarge capacity by not specifying an Availability Zone in your request or choosing eu-west-1b, eu-west-1c
I wonder if there is an easy way of extending the aws autoscaler to detect such errors of InsufficientInstanceCapacity
and use a different availability zone. Given that this would mean that some other aws properties (eg. subnet, security groups, etc.) should be different, we could think of having a “fallback to queue” mechanism in the aws autoscaler. This mechism would work as follows:
- In the aws autoscaler configuration, I specify for a specific queue the fallback queues
- If the autoscaler fails to spin up an instance in that specific queue, it will try to start another instance in one of the fallback queue
In practice, this would allow to have one queue configuration per availability zone. The autoscaler could then spin up an instance faster.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Workstream fallback queues ensure ... - Microsoft Cloud Blogs
Create a fallback queue for each workstream with enhancements to unified routing in Dynamics 365 Customer Service.
Read more >Chat remains in the fallback queue indefinitely when none of ...
When the agent is not available, the chat is sent to a fallback queue. ... Initiate a chat from here---> the chat request...
Read more >Smart Escalations | Fallback rules for call routing - Freshworks
Choose to customize your call queues with the fallback rules the moment you see a surge in the number of missed calls, voicemails,...
Read more >Feature Request: Sorting the Download Queue :: Suggestions / Ideas
Feature Request : Sorting the Download Queue ... Download in most-recently-played order (fallback to requested order if there is no play history)
Read more >Fallback to synchronous if queue isn't available. #2855 - GitHub
Fallback to synchronous if queue isn't available. #2855. Closed ... Labels. Issue Type: Feature Request. Comments. @brianbister. Copy link ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
yes, this is exactly what I have in mind 👍
Great news, thanks Martin
On Sat, 8 Jan 2022, 02:31 Martin.B, @.***> wrote: