ChangeFeedEstimator returns wrong estimatedPendingChanges
See original GitHub issueWe are continuously addressing and improving the SDK, if possible, make sure the problem persist in the latest SDK version.
Describe the bug I tried to use the ChangeFeedEstimator, but got wrong numbers.
I have a container with 20K Items in it. I have a single ChangeFeedProcessor that should mangle them in some way and I would like to know how many items are left to be processed. The estimator is always off by around 100x. (20K Items estimator estimates 200-300 items).
To Reproduce
// See https://aka.ms/new-console-template for more information
using Microsoft.Azure.Cosmos;
#region Setup
var runSetup = true;
var cosmosClient = new CosmosClient("<connString>", new CosmosClientOptions() { AllowBulkExecution = true });
const string dbName = "EstimatorDb";
const string containerName = "EstimatorContainer";
const string leaseContainerName = "leases";
await cosmosClient.CreateDatabaseIfNotExistsAsync(dbName);
var database = cosmosClient.GetDatabase(dbName);
var container = database.GetContainer(containerName);
var leaseContainer = database.GetContainer(leaseContainerName);
var deleteAndRecreate = new Func<string, string, Task>(async (name, key) =>
{
var cont = database.GetContainer(name);
try
{
await cont.DeleteContainerAsync();
}
catch
{
// Nothing to do
}
await database.CreateContainerIfNotExistsAsync(new ContainerProperties(name, key), 6000);
});
if (runSetup)
{
await deleteAndRecreate(containerName, "/id");
var inserts = new List<Task>();
foreach (var i in Enumerable.Range(0, 20_000))
{
var testdata = new Testdata(Guid.NewGuid().ToString(), "First", "Last");
inserts.Add(container.CreateItemAsync(testdata, new PartitionKey(testdata.id)));
}
await Task.WhenAll(inserts);
}
await deleteAndRecreate(leaseContainerName, "/id");
using var iterator = container.GetItemQueryIterator<dynamic>(new QueryDefinition("SELECT count(1) as Count FROM c"));
var response = await iterator.ReadNextAsync();
long count = response.Resource.Single().Count;
Console.WriteLine($"We have {count} items in {containerName}");
#endregion
var changeFeedProcessor = container.GetChangeFeedProcessorBuilder<dynamic>("processor", OnChangesDelegate)
.WithStartTime(DateTime.MinValue.ToUniversalTime())
.WithInstanceName("instance")
.WithMaxItems(1000)
.WithPollInterval(TimeSpan.FromMilliseconds(100))
.WithLeaseContainer(leaseContainer)
.Build();
await changeFeedProcessor.StartAsync();
var estimator = container.GetChangeFeedEstimatorBuilder("processor", EstimationDelegate, TimeSpan.FromSeconds(1))
.WithLeaseContainer(leaseContainer)
.Build();
await estimator.StartAsync();
Task EstimationDelegate(long estimatedPendingChanges, CancellationToken cancellationToken)
{
Console.WriteLine($"Estimation: {estimatedPendingChanges}");
return Task.CompletedTask;
}
async Task OnChangesDelegate(IReadOnlyCollection<dynamic> changes, CancellationToken cancellationToken)
{
// Do Nothing just slow down
await Task.Delay(100);
}
Console.ReadLine();
public record Testdata (string id, string Firstname, string Lastname);
Expected behavior I get back the correct amount of how many items the change feed needs to process. (20k in the beginning)
Actual behavior Estimator returns ~300 items and gets down to 0 while processing the changes.
I have tried both approaches that are mentioned in the docs. (on-demand detailed estimation and the push model) Both approaches give me the same numbers. And i have tried with CosmosDB Emulator and with a real CosmosDB.
Environment summary SDK Version: 3.23.0 OS Version Win 10 Pro 20H2
Additional context No Context
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:5 (4 by maintainers)
Top GitHub Comments
For sure, we can add it to the Estimator API xml docs
Ok, that explains why I get different estimations if I use the NodeJS SDK to insert the data.
So there is absolutely no possibility to get the remaining items? Because the consumption of the change feed can differ greatly from the insert batch size. Also, different applications use different insertion models. So the estimation can be off by a factor of 100 max if I use max batch size or not off at all if I insert one item at a time. This sounds not like an estimation at all…
Also, it would be nice if this gets added to the Docs. I would have never expected, that if I change a single property
AllowBulkExecution
to affect a totally different part of an Application. But thanks for the explanation.