User-defined retries
See original GitHub issueBriefest of discussions with Jose. NOTE: All naming up in the air.
Enable a runtime attribute such as retryOnStderrPattern
that populates a value retryAttempt
/retry
/retryCount
/retry_count
/etc. This will enable tasks such as:
task mytask {
command {
mycommand.sh
}
runtime {
retryOnStderrPattern = "(OutOfMemoryError|disk quota exceeded)"
memory = (6 * retryAttempt) + "GB"
disk = "local-disk " + (100 * retryAttempt) + " SSD"
docker = "myrepo/myimage"
}
}
When the stderr contains the specified regular expression pattern, the job should be retried with the counter incremented.
Not discussed afaik, how to limit the number of attempts: another runtime attribute, a backend config value, both, other?
Issue Analytics
- State:
- Created 7 years ago
- Reactions:2
- Comments:35 (10 by maintainers)
Top Results From Across the Web
Retrying event-driven functions - Google Cloud
This document describes how to enable retrying for event-driven functions. Automatic retrying is not available for HTTP functions.
Read more >Working with Retry Properties - Oracle Help Center
About Retry Properties. When a response from a network element is received that is mapped to a RETRY user-defined exit type (UDET), the...
Read more >Advanced Client-side Transaction Retries | CockroachDB Docs
Advanced client-side transaction retry features for library authors. ... Retrying transactions using these statements has the following benefits:.
Read more >Configuring Rebalance Retries | Couchbase Docs
This verifies that rebalance retry has been disabled, the required period between retries changed to 100 seconds, and the maximum number of retries...
Read more >User-defined exceptions handled incorrectly with retries, may ...
When the retries option is non-zero, an exception thrown on the server results in the client incorrectly throwing a NoServersAvailable ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
FWIW this has been discussed as a key feature for a WDL push next quarter
Could I propose that this be re-prioritized? It would help us deal with transient GCS hiccups in production (eg., connections suddenly getting closed, etc.). Individual tools in the GATK and Picard can’t possibly catch every exception across every library involved, so an execution-framework-level retry at the job level would help enormously.