question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

SqlConnection.Open() is not finished with success to specified Failover Partner on linux.

See original GitHub issue

The main problem is:

We use sql connection string with Failover Partner server and Connect Timeout = 30. The program runs on linux (Ubuntu 19.10). When Data Source server is not available (server turned off or blocked by firewall), SqlConnection.Open() does not connect to Failover Partner server despite it is available and act as SQL principal.

The reason description:

As it was discovered, the problem is at SNITCPHandle.Connect(…) procedure: It does not terminate connection at specified timeout. https://github.com/dotnet/SqlClient/blob/3ca39848735143055d9d7d4864d5f1bfd1976b99/src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SNI/SNITcpHandle.cs

It uses CancellationTokenSource with Cancel() callback, but it does not really cancel socket.Connect() operation.

Here is the part of code with additional debug logging:

private static Socket Connect(string serverName, int port, TimeSpan timeout, bool isInfiniteTimeout)
{
	var sw = Stopwatch.StartNew();

	...

	CancellationTokenSource cts = null;

	void Cancel()
	{
		for (int i = 0; i < sockets.Length; ++i)
		{
			try
			{
				if (sockets[i] != null && !sockets[i].Connected)
				{
					Console.WriteLine($"{sw.Elapsed} Disposing socket {i}...");
					sockets[i].Dispose();
					Console.WriteLine($"{sw.Elapsed} Disposed {i}");
					sockets[i] = null;
				}
			}
			catch { }
		}
	}

	if (!isInfiniteTimeout)
	{
		cts = new CancellationTokenSource(timeout);
		cts.Token.Register(Cancel);
	}

	Socket availableSocket = null;
	try
	{
		for (int i = 0; i < sockets.Length; ++i)
		{
			try
			{
				if (ipAddresses[i] != null)
				{
					sockets[i] = new Socket(ipAddresses[i].AddressFamily, SocketType.Stream, ProtocolType.Tcp);
					Console.WriteLine($"{sw.Elapsed} Connecting socket {i}...");
					sockets[i].Connect(ipAddresses[i], port);
					Console.WriteLine($"{sw.Elapsed} Connect socket finished {i}.");
					
					...
				}
			}
			catch (Exception ex) { Console.WriteLine($"{sw.Elapsed} Exception on connect socket {i}: {ex.Message}"); }
		}
	}
	finally
	{
		...
	}

	return availableSocket;
} 

Below is the console output when run this methhod with timeout = TimeSpan.FromSeconds(10):

00:00:00.0018383 Connecting socket 0...
00:00:10.0036811 Disposing socket 0...
00:02:10.7964222 Disposed 0
00:02:10.8004390 Exception on connect socket 0: Connection timed out 10.0.0.16:1433
00:02:10.8006636 Finished.

As you can see, despite Cancel() callback is occured at spesified timeout, the Socket.Connect() is continue processing and finished with 130 sec timeout. The reason of such behavior is well described at Marek Majkowski’s article “When TCP sockets refuse to die”: https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/

“…the operating system sends a SYN packet. Since it didn’t get any response the OS will by default retry sending it 6 times. The retries are staggered at 1s, 3s, 7s, 15s, 31s, 63s marks (the inter-retry time starts at 2s and then doubles each time). By default the whole process takes 130 seconds, until the kernel gives up with the ETIMEDOUT errno.”

To solve the problem:

As a result: Socket.Dispose() is not proper method to cancel Socket.Connect() process. To be able to finish connect in specified timeout, this should be refactored to use Socket.BeginConnect() call.

Where to reproduce:

It is reproduced on all versions of System.Data.SqlClient.SqlConnection and Microsoft.Data.SqlClient.SqlConnection.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:11 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
karinazhoucommented, Jul 7, 2020

Hi @alexanderinochkin ,

I tried your suggestion to block the ServerA IP on Ubuntu and I can now reproduce the hanging behavior. If the connection timeout is increased to more than 130s such as 180s, it can connect to the failover partner in the end. In fact, even without giving the failover partner in the connection string, the connection will fail after about 130s as you have discovered.

Another thing I found is that if you try something like Data Source=ServerA;UID=uid; PWD=pwd;Database=db;MultiSubnetFailover=True;Connect Timeout=15;Pooling=False, the driver will go into TryConnectParallel() instead of Connect(). The connection will fail after the expected timeout. However, you cannot use MultiSubnetFailover=True with Failover Partner at the same time.

When testing the managed SNI on Windows by setting the following two lines of code in your application :

string ManagedNetworkingAppContextSwitch = "Switch.Microsoft.Data.SqlClient.UseManagedNetworkingOnWindows";
AppContext.SetSwitch(ManagedNetworkingAppContextSwitch, true);

the Socket.Dispose() can work as expected. However, it doesn’t work as expected on Ubuntu which I agree with. We will investigate more to see what possible mechanism we can use to actually dispose of the socket.

0reactions
karinazhoucommented, Sep 15, 2020

Hi @alexanderinochkin , I understand and agree that the async implementation can fix the limitation of Socket.Connect on Linux platform when the target resource is inaccessible by unblocking the working thread. However, we cannot roll back to the original async implementation for this due to its regression in another issue I have mentioned above.

We will keep this in our backlog for further investigation until we find a proper way to fix it. For now, I am afraid that we have to keep using Socket.Connect in the driver.

Read more comments on GitHub >

github_iconTop Results From Across the Web

SqlConnection.ConnectionString Property
Keyword Default Description Addr N/A Synonym of Data Source. Address N/A Synonym of Data Source. App N/A Synonym of Application Name.
Read more >
Failover Partner behavior for Sql ConnectionString with ...
If the primary database server is not available, then ADO.Net will immediately attempt to contact the failover partner. If the primary server is ......
Read more >
SQL connection with mirroring still working after a failover
What happens is that when you successfully connect to the primary, then the SQL Server will send the failover partner to the client...
Read more >
Connect-DbaInstance
This command creates a robust, reusable sql server object. It is robust because it initializes properties that do not cause enumeration by default....
Read more >
Configuring SQL Server AlwaysOn availability groups with ...
Scope of fail-over, Group of databases, Instance ... Open() $Cmd = New-Object System.Data. ... On the Results page, verify that the failover was...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found