question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Queries with MultipleActiveResultSets=True (MARS) are very slow / time out on Linux

See original GitHub issue

Describe the bug

TL;DR:
Queries using connections with MARS enabled, even when they don’t use MARS, are much slower or even time out on Linux. The same queries are fast and reliable on Windows no matter whether MARS is disabled or enabled and on Linux when MARS is disabled.

Context Octopus Cloud hosts Octopus Deploy instances in Linux containers on Azure AKS with data stored in Azure Files and Azure SQL. A couple of months ago we noticed that some of the SQL queries were much slower or even started timing out which is not something we’ve experienced before on Windows using Full .NET Framework. Some of the slowdown might be caused by AKS (K8s) but we think that the SqlClient might also be playing a role here. 119112824000676 is our Azure Support Request if that helps in any way.

Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
 ---> System.ComponentModel.Win32Exception (258): Unknown error 258
   at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at Microsoft.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
   at Microsoft.Data.SqlClient.TdsParser.TdsExecuteTransactionManagerRequest(Byte[] buffer, TransactionManagerRequestType request, String transactionName, TransactionManagerIsolationLevel isoLevel, Int32 timeout, SqlInternalTransaction transaction, TdsParserStateObject stateObj, Boolean isDelegateControlRequest)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransactionYukon(TransactionRequest transactionRequest, String transactionName, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransaction(TransactionRequest transactionRequest, String name, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
   at Microsoft.Data.SqlClient.SqlInternalConnection.BeginSqlTransaction(IsolationLevel iso, String transactionName, Boolean shouldReconnect)
   at Microsoft.Data.SqlClient.SqlConnection.BeginTransaction(IsolationLevel iso, String transactionName)
   at Microsoft.Data.SqlClient.SqlConnection.BeginTransaction(IsolationLevel iso)
   at reprocli.Program.Scenario4(String connString, Int32 number)
   at reprocli.Program.<>c__DisplayClass0_0.<Main>b__0(Int32 n)
   at System.Linq.Parallel.ForAllOperator`1.ForAllEnumerator`1.MoveNext(TInput& currentElement, Int32& currentKey)
   at System.Linq.Parallel.ForAllSpoolingTask`2.SpoolingWork()
   at System.Linq.Parallel.SpoolingTaskBase.Work()
   at System.Linq.Parallel.QueryTask.BaseWork(Object unused)
   at System.Linq.Parallel.QueryTask.<>c.<.cctor>b__10_0(Object o)
   at System.Threading.Tasks.Task.InnerInvoke()
   at System.Threading.Tasks.Task.<>c.<.cctor>b__274_0(Object obj)
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
ClientConnectionId:005d2aae-9409-4711-aaa0-b03b70f2832e
Error Number:-2,State:0,Class:11
ClientConnectionId before routing:e3300799-fdd0-40a4-84ea-b9f383596b12
Routing Destination:fed2c41af7dc.tr5.westus2-a.worker.database.windows.net,11063<---

We also captured TCP dumps while running the tests on Linux and it looks like enabling MARS causes TCP RST.

image

image

Full TCP Dumps: https://github.com/benPearce1/k8s-sql-timeout-repro/tree/tiny/source/reprocli/tcpdumps

To reproduce

Code

Repo with the sample app: https://github.com/benPearce1/k8s-sql-timeout-repro/blob/tiny/source/reprocli/Program.cs. Compiled folder contains pre-compiled versions of the app so .NET Core SDK doesn’t have to be present on the target VMs.

The first parameter is the level of parallelism. The second parameter is the connection string.

using System;
using System.Data;
using System.Diagnostics;
using System.Linq;
using Microsoft.Data.SqlClient;

namespace reprocli
{
    class Program
    {
        static void Main(string[] args)
        {
            try
            {
                var count = int.Parse(args[0]);
                var connectionString = args[1];

                var total = Stopwatch.StartNew();

                PrepareData(connectionString);
                total.Restart();
                Enumerable.Range(0,count)
                    .AsParallel()
                    .WithDegreeOfParallelism(count)
                    .ForAll(n => Scenario4(connectionString, n));

                Console.WriteLine($"Total: {total.Elapsed}");

            }
            catch (Exception e)
            {
                Console.WriteLine(e);
                throw;
            }
        }

        private static void Scenario4(string connString, int number)
        {
            var userStopWatch = Stopwatch.StartNew();

            var buffer = new object[100];
            for (var i = 0; i < 210; i++)
            {
                var queryStopWatch = Stopwatch.StartNew();


                using (var connection = new SqlConnection(connString))
                {
                    connection.Open();
                    using (var transaction = connection.BeginTransaction(IsolationLevel.ReadCommitted))
                    {
                        using (var command = new SqlCommand("SELECT * From TestTable", connection, transaction))
                        {
                            using (var reader = command.ExecuteReader())
                            {
                                while (reader.Read())
                                {
                                    reader.GetValues(buffer);
                                }
                            }
                        }

                        transaction.Commit();
                    }
                }

                queryStopWatch.Stop();
                Console.WriteLine($"Number: {number}. Query: {i} Time: {queryStopWatch.Elapsed}");
            }

            userStopWatch.Stop();
            Console.WriteLine($"Number: {number}. All Queries. Time: {userStopWatch.Elapsed}");
        }

        static void PrepareData(string connectionString)
        {
            var createTable = @"
                DROP TABLE IF EXISTS TestTable;
                CREATE TABLE TestTable
                (
                    [Id] [nvarchar](50) NOT NULL PRIMARY KEY,
                    [Name] [nvarchar](20) NOT NULL
                );";

            using (var connection = new SqlConnection(connectionString))
            {
                connection.Open();
                using (var transaction = connection.BeginTransaction(IsolationLevel.ReadCommitted))
                {
                    using (var command = new SqlCommand(createTable, connection, transaction))
                    {
                        command.ExecuteNonQuery();
                    }

                    transaction.Commit();
                }
            }

        }
    }
}

This is how we reproduced the problem which doesn’t mean you need this exact config.

The database was hosted in an Azure SQL Elastic Pool (Standard: 300 eDTUs) on a SQL Server in West US 2 region.

LINUX

Run the sample app with the following arguments on a Linux (ubuntu 18.04) VM (Standard D8s v3 (8 vcpus, 32 GiB memory) in Azure West US 2 region.

MARS ON

dotnet reprocli.dll 200 'Server=tcp:YOURSERVER.database.windows.net,1433;Initial Catalog=TestDatabase;Persist Security Info=False;User ID=YOURUSER;Password=YOURPASSWORD;MultipleActiveResultSets=True;'

The expected result is that the app finishes without throwing any errors but that’s not the case and Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding. is thrown.

Reducing the level of parallelism to 20 stops the app from crashing.

Also, when MARS is ON the console doesn’t show any progress for 10+ seconds. This is not the case when MARS is OFF.

MARS OFF

dotnet reprocli.dll 200 'Server=tcp:YOURSERVER.database.windows.net,1433;Initial Catalog=TestDatabase;Persist Security Info=False;User ID=YOURUSER;Password=YOURPASSWORD;MultipleActiveResultSets=False;'

The expected result is that the app finishes without throwing any errors which is the case. The app finished in just under 25 seconds. Total: 00:00:24.9737616. The app also worked with much higher levels of parallelism (e.g. 500)

AKS

Same spec as above: Linux (ubuntu 18.04) VM (Standard D8s v3 (8 vcpus, 32 GiB memory) in Azure West US 2. We also ran this test in a container in AKS and the results were pretty much the same. The only difference was that we had to lower the level of parallelism even more. K8s networking adds a bit of overhead which might make the problem more pronounced.

WINDOWS

Run the sample app with the following arguments on a Windows (Windows Server 2016 Datacenter) VM (Standard D8s v3 (8 vcpus, 32 GiB memory) in Azure West US 2 region.

dotnet reprocli.dll 200 'Server=tcp:YOURSERVER.database.windows.net,1433;Initial Catalog=TestDatabase;Persist Security Info=False;User ID=YOURUSER;Password=YOURPASSWORD;MultipleActiveResultSets=True;'

The expected result is that the app finishes without throwing an exception which is the case. The app finished in just under 24 seconds. Total: 00:00:23.4068641. It also worked with level of parallelism set to 500. We achieved similar results with MARS disabled.

Note: We used .NET Core to run tests in Windows.

Expected behavior

The sample app should not crash and connections with MARS feature enabled should behave in the same way on both Linux and Windows.

Further technical details

Microsoft.Data.SqlClient version: 1.1.0 and 2.0.0-preview1.20021.1 .NET target: (Core 2.2 and Core 3.1) SQL Server version: (Azure SQL) Operating system: (Ubuntu 18.04 and AKS with Ubuntu 18.4)

Additional context We’ve been battling this issue for a long time now so we are happy to help in any way we can to get it resolved.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:36
  • Comments:98 (42 by maintainers)

github_iconTop GitHub Comments

5reactions
tungercommented, Mar 27, 2020

Good that this github issue exists (thanks!), we seem to have run into the same issue. Problem only appears when running the (asp core + ef core 3.1.2) app on Docker with Kubernetes with MARS on. Our background service handling lots of data would simply “die”, sometimes with and sometimes without any exception thrown. As it is a BackgroundService/IHostedService, the web app continues to run, just the BackgroundService is gone.

I turned MARS off and now it works.

I got two kinds of exceptions, this one with default settings of DbContext.

An exception occurred while iterating over the results of a query for context type '***'.
Microsoft.Data.SqlClient.SqlException (0x80131904): The connection is broken and recovery is not possible.  The connection is marked by the server as unrecoverable.  No attempt was made to restore the connection.
   at Microsoft.Data.SqlClient.SqlCommand.<>c.<ExecuteDbDataReaderAsync>b__164_0(Task`1 result)
   at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Query.Internal.QueryingEnumerable`1.AsyncEnumerator.InitializeReaderAsync(DbContext _, Boolean result, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func`4 operation, Func`4 verifySucceeded, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Query.Internal.QueryingEnumerable`1.AsyncEnumerator.MoveNextAsync()

When setting the command timeout to five minutes, I got this exception - same as the opener of this issue.

An exception occurred while iterating over the results of a query for context type '"***"'."
""Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
 ---> System.ComponentModel.Win32Exception (258): Unknown error 258
   at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at Microsoft.Data.SqlClient.SqlDataReader.TrySetMetaData(_SqlMetaDataSet metaData, Boolean moreInfo)
   at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at Microsoft.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
   at Microsoft.Data.SqlClient.SqlDataReader.get_MetaData()
   at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)
   at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean isAsync, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest)
   at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry, String method)
   at Microsoft.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior)
   at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReader(RelationalCommandParameterObject parameterObject)
   at Microsoft.EntityFrameworkCore.Query.Internal.QueryingEnumerable`1.Enumerator.InitializeReader(DbContext _, Boolean result)
   at Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.Execute[TState,TResult](TState state, Func`3 operation, Func`3 verifySucceeded)
   at Microsoft.EntityFrameworkCore.Query.Internal.QueryingEnumerable`1.Enumerator.MoveNext()
ClientConnectionId:3d813f87-29be-4a5a-9e6d-faff5d0e0a5f
Error Number:-2,State:0,Class:11"

This issue caused lots of working days of diagnosing, as there is no clear indication what is wrong, hindering troubleshooting.

4reactions
TheRockStarDBAcommented, Nov 23, 2022

This bit us big time. Setting MultipleActiveResultSets=true caused lots of timeouts when running .net core app on linux pod on K8s. Removing it from connection string made the app very very fast and responsive and the “Connection Timeout Expired” errors are all gone.

Read more comments on GitHub >

github_iconTop Results From Across the Web

SQL queries to MSSQL contains pauses even with MARS ...
MSSQL version is 2005. To remove such pauses, we've tried to enable MARS (Multiple Active Result Sets) via connection string parameters of ...
Read more >
MultipleActiveResultSets (MARS) in CMS 12 on DXP - Support
When connection string is configured with MultipleActiveResultSets (MARS) set to true can the application start to throw SQL timeout ...
Read more >
5 Tips to Optimize SQL Server Application Performance
Tell your developer not to use Multiple Active Result Sets (MARS). While almost no DBA's know about MARS, for SQL Server applications that...
Read more >
Enabling Multiple Active Result Sets - SQL Server
The MARS feature is disabled by default. It can be enabled by adding the "MultipleActiveResultSets=True" keyword pair to your connection string.
Read more >
Connect to Microsoft SQL Server (FireDAC) - RAD Studio
Setting this option to True may slow down a dataset opening. False -- FireDAC uses the restricted information about the query columns. This...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Hashnode Post

No results found