Performance issue with response time when upgrading from 3.31.0 to 3.31.2
See original GitHub issueDescribe the bug Increased response time since updating from 3.31.0 to 3.31.2. This impacted around 5% of http requests.
To Reproduce When we updated from Microsoft.Azure.Cosmos 3.31.0 to 3.31.2 we noticed an increase in response time for our http requests. We use a single cosmos client in our services.
The code that talks to cosmos is a cross partition query, see below:
public async Task<IEnumerable<T>> GetAll(Expression<Func<T, bool>> predicate)
{
var iterator = container.GetItemLinqQueryable<T>()
.Where(entity => entity.Kind == DocumentKind)
.Where(predicate)
.ToFeedIterator();
var entities = new List<T>();
while (iterator.HasMoreResults)
{
var page = await iterator.ReadNextAsync();
entities.AddRange(page);
}
return entities;
}
The function is used as follows:
GetAll(item => item.Key == value)
The container has roughly 5 million items and is partioned.
Expected behavior We expect the upgrade to not impact performance.
Actual behavior An increase in response time for http requests.
The graph shows the response time in milliseconds for one of our endpoints, for the 99th percentile. In the 50th percentile there is no noticeable difference. The peaks are with the sdk version 3.31.2 and when we downgraded to 3.31.0 the response time returned to its usual duration. The second peak is when we upgraded to 3.31.2 again to see if that was the issue. This has been verified in both .NET 6 and .NET 7. The only change between the peaks is upgrading/downgrading Microsoft.Azure.Cosmos.
Environment summary SDK Version: .NET 7.0.1 and .NET 6.0.12 OS Version debian.11-x64
Issue Analytics
- State:
- Created 9 months ago
- Comments:12 (6 by maintainers)
Top GitHub Comments
We have been working on this issue for a while and have a few observations:
We have tried to update to 3.31.1 and above, however the 99 problem still occurs. What did solve it was updating from the P1V2 app service plan to the P1V3 app service plan which allowed us to update the sdk without any performance issues. The main difference seems to be that we now have two cores on each instance. We are not sure why this matters since we rarely had a high cpu percentage and did not to have threading issues.
Cleaning up issues with needs-investigation that are dormant/not-actionable