Is there a way to add retry policy for transient failures in database operations?
See original GitHub issueWe are running our clustered jobs scheduler in Azure. It is common that a database connection is interrupted from time to time or SQL command fails due to transient network failure. Is there a way to configure a retry policy so that for example, when trigger state is update fails the a retry policy is immediately applied.
I am investigating for a way how to avoid restarting a scheduler.
Thank you!
Quartz.JobPersistenceException: Couldn't update states of blocked triggers: Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.Data.SqlClient.SqlException: Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.ComponentModel.Win32Exception: The wait operation timed out
--- End of inner exception stack trace ---
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.SqlCommand.InternalEndExecuteNonQuery(IAsyncResult asyncResult, String endMethod, Boolean isInternal)
at System.Data.SqlClient.SqlCommand.EndExecuteNonQueryInternal(IAsyncResult asyncResult)
at System.Data.SqlClient.SqlCommand.EndExecuteNonQueryAsync(IAsyncResult asyncResult)
at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Quartz.Impl.AdoJobStore.StdAdoDelegate.<UpdateTriggerStatesForJobFromOtherState>d__70.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Quartz.Impl.AdoJobStore.JobStoreSupport.<TriggerFired>d__237.MoveNext()
--- End of inner exception stack trace ---
at Quartz.Impl.AdoJobStore.JobStoreSupport.<TriggerFired>d__237.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Quartz.Impl.AdoJobStore.JobStoreSupport.<>c__DisplayClass236_0.<<TriggersFired>b__0>d.MoveNext() [See nested exception: System.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.ComponentModel.Win32Exception (0x80004005): The wait operation timed out
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.SqlCommand.InternalEndExecuteNonQuery(IAsyncResult asyncResult, String endMethod, Boolean isInternal)
at System.Data.SqlClient.SqlCommand.EndExecuteNonQueryInternal(IAsyncResult asyncResult)
at System.Data.SqlClient.SqlCommand.EndExecuteNonQueryAsync(IAsyncResult asyncResult)
at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Quartz.Impl.AdoJobStore.StdAdoDelegate.<UpdateTriggerStatesForJobFromOtherState>d__70.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Quartz.Impl.AdoJobStore.JobStoreSupport.<TriggerFired>d__237.MoveNext()
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:7 (5 by maintainers)
Top Results From Across the Web
Azure SQL Database - Working with transient errors
Retry logic for transient errors. Client programs that occasionally encounter a transient error are more robust when they contain retry logic.
Read more >Transient fault handling - Best practices for cloud applications
Perform retry operations only when the faults are transient (typically indicated by the nature of the error) and when there's at least some ......
Read more >How to Fix Transient Errors to Improve Application Resilience
Determine when a fault is likely to be transient or a terminal one. Retry the operation if it determines that the fault is...
Read more >Transient Fault Handling | Serverless360 Blog
This article is about handling transient failures in Azure. ... Create a retry policy that uses a retry strategy from the configuration.
Read more >Best practices for retry pattern - harish bhattbhatt - Medium
Understand that operation failed is suitable for retry · Use Exponential back-off for retry · Determine the number of retry attempts and interval ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@puddlewitt you are indeed correct that the logic won’t take action here. I’ve opened an issue on Java side to discuss what is the correct action as there is a logic fault as far as I understand. The retry-logic should come from JobStoreSupport level where transaction is retried on error.
I don’t think
TriggerFired
is covered byIsTransient
because it doesn’t callRollbackConnection
. Doesn’t appear to be then caught byReleaseAcquiredTrigger
because the exception type doesn’t match.