Mapping exceptions after an incident where a lot of database calls were made (without recovering)
See original GitHub issueDescribe what is not working as expected.
There was an incident on our production environment. We have two instances of the same application (backend). A client made around 2.5k requests at the same time (1.25k / app).
The first issue which appeared is the following exception:
System.Data.Entity.Core.EntityException: The underlying provider failed on Open.
---> System.InvalidOperationException: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
at async System.Data.Common.ADP.ExceptionWithStackTrace(Exception e)
at async System.Data.Entity.Infrastructure.DbExecutionStrategy.<>c__DisplayClass19_0.<<ExecuteAsync>b__0>d.MoveNext()
at System.Data.Entity.Infrastructure.DbExecutionStrategy.ProtectedExecuteAsync<TResult>(Func<T> operation, CancellationToken cancellationToken)
at System.Data.Entity.Core.EntityClient.EntityConnection.OpenAsync(CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at System.Data.Entity.Core.EntityClient.EntityConnection.OpenAsync(CancellationToken cancellationToken)
at System.Data.Entity.Core.Objects.ObjectContext.EnsureConnectionAsync(Boolean shouldMonitorTransactions, CancellationToken cancellationToken)
at System.Data.Entity.Core.Objects.ObjectContext.ExecuteInTransactionAsync<T>(Func<T> func, IDbExecutionStrategy executionStrategy, Boolean startLocalTransaction, Boolean releaseConnectionOnSuccess, CancellationToken cancellationToken)
at System.Data.Entity.Utilities.TaskExtensions.CultureAwaiter<T>.GetResult()
at System.Data.Entity.Infrastructure.DbExecutionStrategy.ProtectedExecuteAsync<TResult>(Func<T> operation, CancellationToken cancellationToken)
at System.Data.Entity.Utilities.TaskExtensions.CultureAwaiter<T>.GetResult()
at System.Data.Entity.Core.Objects.ObjectQuery<T>.GetResultsAsync(Nullable<T> forMergeOption, IDbExecutionStrategy executionStrategy, CancellationToken cancellationToken)
at System.Data.Entity.Utilities.TaskExtensions.CultureAwaiter<T>.GetResult()
at System.Data.Entity.Internal.LazyAsyncEnumerator<T>.FirstMoveNextAsync(CancellationToken cancellationToken)
at System.Data.Entity.Infrastructure.IDbAsyncEnumerableExtensions.ForEachAsync<T>(IDbAsyncEnumerator<T> enumerator, Action<T> action, CancellationToken cancellationToken)
that is expected taking into consideration the number of concurrent HTTP requests. After that our both applications started throwing a lot of random exceptions as you can see in the following picture:
Those exceptions do not make any sense. I even saw two exceptions for the same property: once trying to map it into a boolean, and second time trying to map it into a decimal (instead of string).
It feels like Entity Framework cannot map the retrieved data into correct data types. Seems like everything is scrambled.
The major problem is that it does not recover. The incident lasted around 30-40 minutes and a simple recreation of the pod fixed it.
I will paste some stacktraces below:
System.InvalidOperationException: The 'x' property on 'y' could not be set to a 'System.String' value. You must set this property to a non-null value of type 'System.Int32'.
at System.Data.Entity.Core.Common.Internal.Materialization.Shaper.ErrorHandlingValueReader<T>.GetValue(DbDataReader reader, Int32 ordinal)
at lambda_method(Closure , Shaper )
at System.Data.Entity.Core.Common.Internal.Materialization.Shaper.HandleEntityAppendOnly<TEntity>(Func<T1,T2> constructEntityDelegate, EntityKey entityKey, EntitySet entitySet)
at lambda_method(Closure , Shaper )
at System.Data.Entity.Core.Common.Internal.Materialization.Coordinator<T>.ReadNextElement(Shaper shaper)
at System.Data.Entity.Core.Common.Internal.Materialization.Shaper<T>.SimpleEnumerator.MoveNextAsync(CancellationToken cancellationToken)
at System.Data.Entity.Internal.LazyAsyncEnumerator<T>.FirstMoveNextAsync(CancellationToken cancellationToken)
at System.Data.Entity.Infrastructure.IDbAsyncEnumerableExtensions.FirstOrDefaultAsync<TSource>(IDbAsyncEnumerable<T> source, CancellationToken cancellationToken)
System.IndexOutOfRangeException: Index was outside the bounds of the array.
at System.Data.SqlClient.SqlDataReader.CheckHeaderIsReady(Int32 columnIndex, Boolean permitAsync, String methodName)
at System.Data.SqlClient.SqlDataReader.IsDBNull(Int32 i)
at lambda_method(Closure , Shaper )
at System.Data.Entity.Core.Common.Internal.Materialization.Coordinator.HasNextElement(Shaper shaper)
at System.Data.Entity.Core.Common.Internal.Materialization.Shaper<T>.RowNestedResultEnumerator.MaterializeRow()
at System.Data.Entity.Core.Common.Internal.Materialization.Shaper<T>.RowNestedResultEnumerator.MoveNext()
at System.Data.Entity.Core.Common.Internal.Materialization.Shaper<T>.ObjectQueryNestedEnumerator.TryReadToNextElement()
at System.Data.Entity.Core.Common.Internal.Materialization.Shaper<T>.ObjectQueryNestedEnumerator.MoveNext()
at System.Collections.Generic.List<T>..ctor(IEnumerable<T> collection)
at System.Linq.Enumerable.ToList<TSource>(IEnumerable<T> source)
System.InvalidOperationException: The specified cast from a materialized 'System.String' type to the 'System.Int32' type is not valid.
at System.Data.Entity.Core.Common.Internal.Materialization.Shaper.ErrorHandlingValueReader<T>.GetValue(DbDataReader reader, Int32 ordinal)
at lambda_method(Closure , Shaper )
at System.Data.Entity.Core.Common.Internal.Materialization.Coordinator<T>.ReadNextElement(Shaper shaper)
at System.Data.Entity.Core.Common.Internal.Materialization.Shaper<T>.SimpleEnumerator.MoveNextAsync(CancellationToken cancellationToken)
at System.Data.Entity.Internal.LazyAsyncEnumerator<T>.FirstMoveNextAsync(CancellationToken cancellationToken)
at System.Data.Entity.Infrastructure.IDbAsyncEnumerableExtensions.FirstOrDefaultAsync<TSource>(IDbAsyncEnumerable<T> source, CancellationToken cancellationToken)
Useful information
Our context:
public partial class XEntities : DbContext
{
public XEntities() : base("name=Entities")
{
Database.SetInitializer<XEntities>(null);
}
public XEntities(string connectionName) : base(connectionName)
{
Database.SetInitializer<XEntities>(null);
}
protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
XModelKeyBuilder.BuildCompositeKeys(modelBuilder);
}
//dbsets...
}
DatabaseFactory:
public class DatabaseFactory : Disposable, IDatabaseFactory
{
private DbContext dataContext;
public DbContext Get()
{
if (dataContext != null) return dataContext;
dataContext = new XEntities(connString);
dataContext.Configuration.UseDatabaseNullSemantics = true;
return dataContext;
}
protected override void DisposeCore()
{
if (dataContext != null)
dataContext.Dispose();
}
}
Unfortunately, we use another layer above EF: Generic Repository & UoW Patterns where we inject into constructor the IDatabaseFactory
.
We use Microsoft.Extensions.DependencyInjection.Abstractions 3.1.2
for DI. The registrations are Scoped
. We use async all the way down
.
People with same problem:
https://stackoverflow.com/questions/50560257/entity-framework-throws-unexpected-exceptions-with-a-heavy-workload https://stackoverflow.com/questions/35896100/strange-error-in-sql-server-from-asp-net-app https://stackoverflow.com/questions/47076273/getting-exception-suddenly-from-entityframework https://stackoverflow.com/questions/41839754/entity-framework-6-1-3-invalidoperationexception-after-a-few-days-of-running https://stackoverflow.com/questions/35011086/ef-randomly-sees-wrong-property-type
Further technical details
- ASP.NET Core 3.1
- EF 6.4 (database first)
- Kubernetes 1.16.7 (AKS)
- Docker image - aspnet:3.1-alpine
- Azure SQL (Standard S3: 100 DTUs)
I have a .NET Framework project with the EDMX. After I generate the entities, I run a powershell script to move them to the .NET Standard 2.1 project.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:14 (2 by maintainers)
@jdaigle The incident appeared again this morning.
To keep it straight-forward, the client apps made A LOT of requests to our APIs - APIs which are hitting big database tables.
We have 3 pods for our app. Out of a sudden a lot of timeouts started appearing and since the client apps did not block the UI, it didn’t stop making requests to our apps and forced us to continue hitting our database. That basically means we got no chance of recovering because the client apps didn’t allow us to. When this happens, in order to speed up the recovery, I killed 2/3 pods (in order to create other 2) and the timeouts are starting to disappear (since I get rid of our scheduled DB calls - caused mainly by the triggered API calls)
The weird thing is that the issue we discussed in this thread appeared only on a newly created pod (one of the two):
I’ve ended up scaling the Azure SQL from S3 to S4 and everything is fine now.
I created a timeline showing the Apdex score (all 3 pods combined) during the incident:
In the following 2 screenshots you can see the metrics of the pod which threw mapping exceptions:

What I found out:
I would like to contribute my experience with what I think is the exact same issue (which I’ve never found a the root cause or resolved):
I’ve been seeing these exceptions in our production systems for nearly 3 years. Early on, these exceptions happened frequently in clusters. Sometimes the cluster would last for 5 minutes, sometimes up to 30 minutes. It almost always affects just a single server/instance at a time.
In the past 3 years, the occurrence rate of these exceptions has gone down dramatically. It’s just a small “blip” and Users rarely notice since things recover quickly. However just yesterday (August 24 2020) we encountered a cluster of these exceptions affected 1/3 of our production servers over a period of 30-40 minutes. It was bad enough that Users noticed this time - from their perspective the system was down since so many things were crashing.
Usually this problem eventually resolves itself after a few minutes. It’s almost as if some/most of the connections in the connection pool are corrupt or in some bad state? And after a while the connection pool clears itself out.
Some notes about application code:
Technical Details
Exceptions/Stack Trace Sample
Below is sample of the types of exceptions we see. We’ll often see dozens or hundreds of these exceptions. They all for different queries, different entities. Some async code, some non-async code.
We also have some transaction-related errors in the mix. Some of these exceptions are particularly weird because they’re being thrown from queries which aren’t being executed in any sort of transaction (no local transaction, no TransactionScope). We do have code that uses local transactions, and we do use TransactionScope in a few specific places.