question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

SqlClient. Inappropriate behavior after a commit timeout

See original GitHub issue

I’ve initially created this bug in the CoreFX repository: https://github.com/dotnet/corefx/issues/38900, but it does not look like anyone is looking there. So I’m repeating it here. A small addition: I’ve now tested this using both System.Data.SqlClient (4.6.1) and Microsoft.Data.SqlClient (1.0.19128.1-Preview), compiled for both .NET Core 2.2 & .NET Framework 4.8. In each of these four cases I was able to reproduce this bug.

=============================================================

I have a SQL Azure database and I use System.Data.SqlClient (4.6.1) to talk to it. From time to time (sometimes several times a day, sometimes no incidents for weeks), it would produce a SqlException with Number = -2 (Timeout) during a SqlTransaction.Commit call. I assume that’s happening because SQL Azure databases have mirroring set up and there is some transient network condition happening around Commit() which prevents that from finishing. That’s fine with me - i know those are transient exceptions and would retry them after some interval. The problem is that the connection which caused that exception is returned to the pool. The next time it’s got picked from the pool, it seems to be in some invalid state. Then, when i do “SELECT * FROM MyTable WHERE Id = @Id” on that connection, it would return an empty set, even though that record is 100% there in the table. That is one really disturbing bug.

I’ve tried to build a small reproduction for this issue. The hardest thing was to have SqlTransation.Commit() to throw a timeout exception. I invented a trick to do that (see below, ForceCommitTimeout() function). It’s a bit clumsy one and does not always reproduce. If it does not repoduce in your environment, try increasing RowNumber and running the code several times. From time to time it would produce an exception. Note, that in the connection string I’ve set “Connection Timeout=1”. That timeout value is also used as commit timeout by SqlTransaction. In any case, if someone knows a better way to force a timeout from SqlTransation.Commit() - please share it with me.

The second important point to build a stable reproduction was to set “Max Pool Size=1”. This is to make sure that the same connection is returned from the pool when you open a new connection after a commit timeout has happened.

Before you run the code, you need to create a database named “TestBed” on your DB server and then create a test table in it:

CREATE TABLE TestTable
(
    Id INTEGER NOT NULL IDENTITY(1, 1) PRIMARY KEY,
    Name NVARCHAR(MAX)
)

Now, when you are all set up, here’s the code to reproduce the problem:

using System;
using System.Data.SqlClient;
using System.Diagnostics;

namespace SqlClientTimeoutIssue
{
    internal class Program
    {
        private const string ConnectionString = "Server=(local); Initial Catalog=TestBed; Integrated Security=True; Connection Timeout=1; Max Pool Size=1";
        private const int RowNumber = 10000000;

        private static void Main()
        {
            try
            {
                SelectOne();
                ForceCommitTimeout();
                SelectOne();
            }
            catch (Exception ex)
            {
                Console.WriteLine("Unexpected exception: {0}", ex);
            }
        }

        private static void SelectOne()
        {
            using (var conn = new SqlConnection(ConnectionString))
            {
                conn.Open();
                Console.WriteLine("SelectOne. ConnectionId={0}", conn.ClientConnectionId);

                var result = 0;

                using (var cmd = conn.CreateCommand())
                {
                    cmd.CommandText = "SELECT 1";
                    using (var reader = cmd.ExecuteReader())
                    {
                        while (reader.Read())
                            result = reader.GetInt32(0);
                    }
                }

                Console.WriteLine("SelectOne. Result: {0}", result);
            }
        }

        private static void ForceCommitTimeout()
        {
            using (var conn = new SqlConnection(ConnectionString))
            {
                conn.Open();
                Console.WriteLine("ForceCommitTimeout. ConnectionId={0}", conn.ClientConnectionId);

                using (var tran = conn.BeginTransaction())
                {
                    var timer = Stopwatch.StartNew();

                    for (var i = 0; i < RowNumber; ++i)
                    {
                        using (var cmd = conn.CreateCommand())
                        {
                            cmd.CommandText = "INSERT TestTable (Name) VALUES(N'Some value')";
                            cmd.Transaction = tran;
                            cmd.ExecuteNonQuery();
                        }
                    }

                    Console.WriteLine("ForceCommitTimeout. Insert duration: {0} sec", timer.Elapsed.TotalSeconds);

                    try
                    {
                        timer.Restart();
                        tran.Commit();
                        Console.WriteLine("ForceCommitTimeout. Commit duration: {0} sec", timer.Elapsed.TotalSeconds);
                    }
                    catch (SqlException ex) when (ex.Number == -2)
                    {
                        Console.WriteLine("ForceCommitTimeout. Commit timeout has occurred");
                    }
                }
            }
        }
    }
}

When i run it and i’m lucky to hit a timeout, i get output like this:

SelectOne. ConnectionId=21106a9d-08db-43b5-b2e5-d6f9b751bebb
SelectOne. Result: 1
ForceCommitTimeout. ConnectionId=21106a9d-08db-43b5-b2e5-d6f9b751bebb
ForceCommitTimeout. Insert duration: 1799.231 sec
ForceCommitTimeout. Commit timeout has occurred
SelectOne. ConnectionId=21106a9d-08db-43b5-b2e5-d6f9b751bebb
SelectOne. Result: 0

Notice, that the second call to SelectOne() outputs “0”. Even though it does “SELECT 1”! And that is the essence of this (really weird) bug. I’ve been able to reproduce it on both SQL Azure and my local SQL Server 2017 Dev Edition.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:18 (9 by maintainers)

github_iconTop GitHub Comments

2reactions
GSPPcommented, Aug 3, 2019

You might be able to trigger a commit timeout by suspending the sqlservr.exe process using Process Explorer. Open 100 transactions concurrently all under write load. Then pause the process. Unpause. You should get lucky on a few of those 100 transactions.

You also can set up mirroring or AG and pause the network.

Or, put the transaction log on storage that you can pause. You can pause an SMB file share by interrupting the network.

1reaction
dmytro-gokuncommented, Sep 2, 2019

@cheenamalhotra Thanks a lot! I’ve started using this release in our production environment. If I see the problem happening again, i will open a new issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Getting SQL Server timeout error with correct credentials ...
I have seen this behavior after restoring a database backup from a different computer. This can be fixed by a SQL command in...
Read more >
How to perform server-side transaction timeout if COMMIT ...
How can SQL Server terminate one of its transactions if it exceeds preset timeout? Imagine the following transaction performed by traveling ...
Read more >
How to troubleshoot timeout problems and gather additional ...
One method to troubleshoot this, is to monitor what's happening in the database when the issue occurs with sp_WhoIsActive to capture the ...
Read more >
Timeout expired messages when connecting to SQL Server
A timeout error means that a certain operation takes longer than needed. The client application stops the operation (instead of waiting ...
Read more >
Understand and resolve SQL Server blocking problems
When an SPID receives a query timeout or a cancel, it will terminate the current query and batch but doesn't automatically roll back...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found