question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ChangeFeedEstimator returns wrong estimatedPendingChanges

See original GitHub issue

We are continuously addressing and improving the SDK, if possible, make sure the problem persist in the latest SDK version.

Describe the bug I tried to use the ChangeFeedEstimator, but got wrong numbers.

I have a container with 20K Items in it. I have a single ChangeFeedProcessor that should mangle them in some way and I would like to know how many items are left to be processed. The estimator is always off by around 100x. (20K Items estimator estimates 200-300 items).

To Reproduce

// See https://aka.ms/new-console-template for more information
using Microsoft.Azure.Cosmos;

#region Setup
var runSetup = true;
var cosmosClient = new CosmosClient("<connString>", new CosmosClientOptions() { AllowBulkExecution = true });

const string dbName = "EstimatorDb";
const string containerName = "EstimatorContainer";
const string leaseContainerName = "leases";

await cosmosClient.CreateDatabaseIfNotExistsAsync(dbName);
var database = cosmosClient.GetDatabase(dbName);

var container = database.GetContainer(containerName);
var leaseContainer = database.GetContainer(leaseContainerName);

var deleteAndRecreate = new Func<string, string, Task>(async (name, key) =>
{
    var cont = database.GetContainer(name);
    try
    {
        await cont.DeleteContainerAsync();
    }
    catch
    {
        // Nothing to do
    }
    await database.CreateContainerIfNotExistsAsync(new ContainerProperties(name, key), 6000);
});

if (runSetup)
{
    await deleteAndRecreate(containerName, "/id");
    var inserts = new List<Task>();
    foreach (var i in Enumerable.Range(0, 20_000))
    {
        var testdata = new Testdata(Guid.NewGuid().ToString(), "First", "Last");
        inserts.Add(container.CreateItemAsync(testdata, new PartitionKey(testdata.id)));
    }
    await Task.WhenAll(inserts);
}

await deleteAndRecreate(leaseContainerName, "/id");

using var iterator = container.GetItemQueryIterator<dynamic>(new QueryDefinition("SELECT count(1) as Count FROM c"));
var response = await iterator.ReadNextAsync();
long count = response.Resource.Single().Count;

Console.WriteLine($"We have {count} items in {containerName}");

#endregion

var changeFeedProcessor = container.GetChangeFeedProcessorBuilder<dynamic>("processor", OnChangesDelegate)
                                    .WithStartTime(DateTime.MinValue.ToUniversalTime())
                                    .WithInstanceName("instance")
                                    .WithMaxItems(1000)
                                    .WithPollInterval(TimeSpan.FromMilliseconds(100))
                                    .WithLeaseContainer(leaseContainer)
                                    .Build();
await changeFeedProcessor.StartAsync();

var estimator = container.GetChangeFeedEstimatorBuilder("processor", EstimationDelegate, TimeSpan.FromSeconds(1))
                                    .WithLeaseContainer(leaseContainer)
                                    .Build();

await estimator.StartAsync();

Task EstimationDelegate(long estimatedPendingChanges, CancellationToken cancellationToken)
{
    Console.WriteLine($"Estimation: {estimatedPendingChanges}");
    return Task.CompletedTask;
}

async Task OnChangesDelegate(IReadOnlyCollection<dynamic> changes, CancellationToken cancellationToken)
{
    // Do Nothing just slow down
    await Task.Delay(100);
}

Console.ReadLine();

public record Testdata (string id, string Firstname, string Lastname);

Expected behavior I get back the correct amount of how many items the change feed needs to process. (20k in the beginning)

Actual behavior Estimator returns ~300 items and gets down to 0 while processing the changes.

I have tried both approaches that are mentioned in the docs. (on-demand detailed estimation and the push model) Both approaches give me the same numbers. And i have tried with CosmosDB Emulator and with a real CosmosDB.

Environment summary SDK Version: 3.23.0 OS Version Win 10 Pro 20H2

Additional context No Context

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:2
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
ealsurcommented, Jan 25, 2022

For sure, we can add it to the Estimator API xml docs

0reactions
firewave-remocommented, Jan 25, 2022

Ok, that explains why I get different estimations if I use the NodeJS SDK to insert the data.

So there is absolutely no possibility to get the remaining items? Because the consumption of the change feed can differ greatly from the insert batch size. Also, different applications use different insertion models. So the estimation can be off by a factor of 100 max if I use max batch size or not off at all if I insert one item at a time. This sounds not like an estimation at all…

Also, it would be nice if this gets added to the Docs. I would have never expected, that if I change a single property AllowBulkExecution to affect a totally different part of an Application. But thanks for the explanation.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Use the change feed estimator - Azure Cosmos DB
The estimator will measure the difference between the last processed item (defined by the state of the leases container) and the latest change...
Read more >
Azure Cosmos DB .NET change feed Processor API, SDK ...
Fixes an issue with the calculation of the estimate of remaining work when the Change Feed was empty or no work was pending....
Read more >
Why change feed lag estimator showing lag in millions?
So to monitor the processes we have used the change feed lag estimator for monitoring record lags. the implementation is according to ...
Read more >
Is it possible to monitor Change Feed work perfectly in ...
The Change Feed is per collection, so if you have 10 collections, the estimation is per collection. Checking the estimation every couple of ......
Read more >
No Surprises Act Good Faith Estimate and Patient-Provider ...
While the PPDR process is pending, the provider or facility must not move the bill for the disputed item or service into collection...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found