question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Delay first retry in Transient Error Handling with Azure SQL

See original GitHub issue

The published advice on transient error handling for Azure SQL recommends

  • Delaying “several” seconds before the first retry
  • Closing and opening the connection prior to eeach retry

I’ve poked around in SqlServerRetryingExecutionStrategy and related classes, and can’t find any evidence that either of those two recommendations are followed in EF Core - nor have I been able to figure out how I might implement those recommendations in a custom execution strategy.

Additionally, I’ve found that an Execution Timeout Expired. exception (Error number -2) is explicitly not considered transient – yet it is the single most frequently occurring exception we encounter in our non-EF database code. The retry strategy we’ve implemented for that non-EF code closes and re-opens the connection before retrying the query and has completely eliminated failures due to timeout exceptions. I’ve had to add error number -2 to the errorNumbersToAdd list for EF Core, but, because the connection isn’t closed and re-opened, I have zero expectation that retries for those errors will be successful.

Is there a plan to support the recommended transient error handling when targeting Azure SQL? Is there a way I can implement a custom execution strategy that will close and re-open the database connection?

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:4
  • Comments:13 (10 by maintainers)

github_iconTop GitHub Comments

4reactions
martytcommented, Jul 7, 2022

@MNF this is what we’re using now - I removed all the extra logic to close and reopen the connection since it turned out to be unnecessary. The only things that this strategy does that’s different from the built-in is to use the DecorrelatedJitterBackoffV2 method (in the Polly package) and treat SQL Timeout (-2) errors and IOException and SocketException as transients that should be retried.

/// <summary>
/// A custom sql execution strategy for EntityFramework that utilizes the DecorrelatedJitterBackoffV2 method from Polly
/// to calculate the delay before next retry.
/// </summary>
public class CustomEfCoreSqlExecutionStrategy : SqlServerRetryingExecutionStrategy
{
    private readonly IEnumerable<TimeSpan> _backoffDelays;

    public CustomEfCoreSqlExecutionStrategy(
        ExecutionStrategyDependencies dependencies,
        int maxRetryCount,
        TimeSpan maxRetryDelay,
        ICollection<int> errorNumbersToAdd
    ) : base(dependencies, maxRetryCount, maxRetryDelay, errorNumbersToAdd)
    {
        _backoffDelays = Backoff.DecorrelatedJitterBackoffV2(TimeSpan.FromSeconds(5), maxRetryCount);
    }

    /// <summary>
    /// Get the delay before the next retry using Polly's DecorrelatedJitterV2 implementation
    /// </summary>
    /// <param name="lastException"></param>
    /// <returns></returns>
    protected override TimeSpan? GetNextDelay(Exception lastException)
    {
        var currentRetryCount = ExceptionsEncountered.Count - 1;
        if (currentRetryCount < MaxRetryCount)
            return _backoffDelays.ElementAt(currentRetryCount);

        return null;
    }

    /// <summary>
    /// Experience has shown that we frequently encounter SqlExceptions that have an unknown error number but have an
    /// IOException and/or SocketException as the inner exception. So we need to treat those as transient.
    ///
    /// Also, the default EF Core retry strategy explicitly excludes SQL Timeout errors (-2), but those errors are the most frequent
    /// that we see in the wild. 
    /// </summary>
    /// <param name="exception"></param>
    /// <returns></returns>
    protected override bool ShouldRetryOn(Exception exception)
    {
        // all i/o exceptions are considered transient
        // error code -2 (timeout) is also transient, even though the base implementation says otherwise
        var shouldRetry =
            exception is SqlException {InnerException: IOException or SocketException}
                or SqlException {Number: -2} || base.ShouldRetryOn(exception);

        return shouldRetry;
    }
}
2reactions
AndriySvyrydcommented, Jan 18, 2023

@stevendarby We’ll probably add an Azure-specific execution strategy that the user needs to choose explicitly

Read more comments on GitHub >

github_iconTop Results From Across the Web

Transient fault handling - Best practices for cloud applications
The strategy specifies the number of times the application should retry, the delay between each attempt, and the actions to take after a...
Read more >
Transient Fault Handling (Building Real-World Cloud Apps ...
Create < SqlAzureTransientErrorDetectionStrategy( retryCount: 3, retryInterval: ... ("Fast First" means no delay before the first retry.
Read more >
Troubleshoot common connection issues to Azure SQL ...
The following table lists various transient errors that applications can receive when connecting to Azure SQL Database.
Read more >
Azure Functions error handling and retry guidance
Learn how to handle errors and retry events in Azure Functions, with links to ... DelayInterval, The delay that's used between retries.
Read more >
Connection resiliency and retry logic - EF6
The SqlAzureExecutionStrategy will retry instantly the first time a transient failure occurs, but will delay longer between each retry until ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found