SqlConnection.Open() is not finished with success to specified Failover Partner on linux.
See original GitHub issueThe main problem is:
We use sql connection string with Failover Partner server and Connect Timeout = 30. The program runs on linux (Ubuntu 19.10). When Data Source server is not available (server turned off or blocked by firewall), SqlConnection.Open() does not connect to Failover Partner server despite it is available and act as SQL principal.
The reason description:
As it was discovered, the problem is at SNITCPHandle.Connect(…) procedure: It does not terminate connection at specified timeout. https://github.com/dotnet/SqlClient/blob/3ca39848735143055d9d7d4864d5f1bfd1976b99/src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SNI/SNITcpHandle.cs
It uses CancellationTokenSource with Cancel() callback, but it does not really cancel socket.Connect() operation.
Here is the part of code with additional debug logging:
private static Socket Connect(string serverName, int port, TimeSpan timeout, bool isInfiniteTimeout)
{
var sw = Stopwatch.StartNew();
...
CancellationTokenSource cts = null;
void Cancel()
{
for (int i = 0; i < sockets.Length; ++i)
{
try
{
if (sockets[i] != null && !sockets[i].Connected)
{
Console.WriteLine($"{sw.Elapsed} Disposing socket {i}...");
sockets[i].Dispose();
Console.WriteLine($"{sw.Elapsed} Disposed {i}");
sockets[i] = null;
}
}
catch { }
}
}
if (!isInfiniteTimeout)
{
cts = new CancellationTokenSource(timeout);
cts.Token.Register(Cancel);
}
Socket availableSocket = null;
try
{
for (int i = 0; i < sockets.Length; ++i)
{
try
{
if (ipAddresses[i] != null)
{
sockets[i] = new Socket(ipAddresses[i].AddressFamily, SocketType.Stream, ProtocolType.Tcp);
Console.WriteLine($"{sw.Elapsed} Connecting socket {i}...");
sockets[i].Connect(ipAddresses[i], port);
Console.WriteLine($"{sw.Elapsed} Connect socket finished {i}.");
...
}
}
catch (Exception ex) { Console.WriteLine($"{sw.Elapsed} Exception on connect socket {i}: {ex.Message}"); }
}
}
finally
{
...
}
return availableSocket;
}
Below is the console output when run this methhod with timeout = TimeSpan.FromSeconds(10):
00:00:00.0018383 Connecting socket 0...
00:00:10.0036811 Disposing socket 0...
00:02:10.7964222 Disposed 0
00:02:10.8004390 Exception on connect socket 0: Connection timed out 10.0.0.16:1433
00:02:10.8006636 Finished.
As you can see, despite Cancel() callback is occured at spesified timeout, the Socket.Connect() is continue processing and finished with 130 sec timeout. The reason of such behavior is well described at Marek Majkowski’s article “When TCP sockets refuse to die”: https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
“…the operating system sends a SYN packet. Since it didn’t get any response the OS will by default retry sending it 6 times. The retries are staggered at 1s, 3s, 7s, 15s, 31s, 63s marks (the inter-retry time starts at 2s and then doubles each time). By default the whole process takes 130 seconds, until the kernel gives up with the ETIMEDOUT errno.”
To solve the problem:
As a result: Socket.Dispose() is not proper method to cancel Socket.Connect() process. To be able to finish connect in specified timeout, this should be refactored to use Socket.BeginConnect() call.
Where to reproduce:
It is reproduced on all versions of System.Data.SqlClient.SqlConnection and Microsoft.Data.SqlClient.SqlConnection.
Issue Analytics
- State:
- Created 3 years ago
- Comments:11 (8 by maintainers)

Top Related StackOverflow Question
Hi @alexanderinochkin ,
I tried your suggestion to block the ServerA IP on Ubuntu and I can now reproduce the hanging behavior. If the connection timeout is increased to more than 130s such as 180s, it can connect to the failover partner in the end. In fact, even without giving the failover partner in the connection string, the connection will fail after about 130s as you have discovered.
Another thing I found is that if you try something like
Data Source=ServerA;UID=uid; PWD=pwd;Database=db;MultiSubnetFailover=True;Connect Timeout=15;Pooling=False, the driver will go intoTryConnectParallel()instead ofConnect(). The connection will fail after the expected timeout. However, you cannot use MultiSubnetFailover=True with Failover Partner at the same time.When testing the managed SNI on Windows by setting the following two lines of code in your application :
the
Socket.Dispose()can work as expected. However, it doesn’t work as expected on Ubuntu which I agree with. We will investigate more to see what possible mechanism we can use to actually dispose of the socket.Hi @alexanderinochkin , I understand and agree that the async implementation can fix the limitation of Socket.Connect on Linux platform when the target resource is inaccessible by unblocking the working thread. However, we cannot roll back to the original async implementation for this due to its regression in another issue I have mentioned above.
We will keep this in our backlog for further investigation until we find a proper way to fix it. For now, I am afraid that we have to keep using Socket.Connect in the driver.