question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Cognitive Search Indexer Performance with Azure.Search.Documents

See original GitHub issue

Describe the bug Using Azure.Search.Documents 11.2.0-beta.2, I can add 4 individual files to be indexed, and it takes about 10 seconds a file. If I recreate the index and indexer against a storage account with all 4 files present, it takes over 7 minutes to complete indexing. There are no errors given.

Expected behavior I would expect the time to index to be roughly the same whether the files are added one at a time, or are present for the initial indexing. This is way out of the grey area. I tried with 10 files and it took 20 minutes.

Actual behavior (include Exception or Stack Trace) No errors are produced. The indexer just runs forever, and then eventually succeeds. This is for a demo application, so indexing is controlled through two azure functions. The second is bound to blob storage and causes the indexer to update when a new file is added. The first accepts an http command to start. It then deletes the current index, indexer, storageconnection, and skillset and completely recreates everything

To Reproduce Steps to reproduce the behavior (include a code snippet, screenshot, or any additional information that might help us reproduce the issue)

Here is the problematic function, that does the full reindexing on command:

using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using Microsoft.AspNetCore.Http;
using Microsoft.Extensions.Logging;
using Newtonsoft.Json;
using Microsoft.Extensions.Configuration;
using Azure;
using Azure.Search.Documents.Indexes;
using Azure.Search.Documents.Indexes.Models;
using SearchConstructs;
using System.Linq;
using System.Web.Http;
using Microsoft.Extensions.Azure;

namespace CogSearchIndexing
{
    public static class ResetIndexer
    {
        [FunctionName("ResetIndexer")]
        public static async Task<IActionResult> Run(
            [HttpTrigger(AuthorizationLevel.Function, "post", Route = null)] HttpRequest req,
            ILogger log, ExecutionContext context)
        {
            log.LogInformation("ResetIndexer is recreating the search index");

            var configuration = new ConfigurationBuilder()
            .SetBasePath(context.FunctionAppDirectory)
            .AddJsonFile("local.settings.json", optional: true, reloadOnChange: true)
            .AddEnvironmentVariables()
            .Build();

            Exception err;

            try
            {
                // Create a new SearchIndexClient
                Uri endpoint = new Uri(configuration["SearchServiceUri"]);
                AzureKeyCredential credential = new AzureKeyCredential(configuration["SearchServiceAdminApiKey"]);
                SearchIndexClient indexClient = new SearchIndexClient(endpoint, credential);

                // delete synchronously, as there is some issue with the indexer running for forever and a day, let's see if we ensure deltion on the same thread...
                // fully delete the index
                try
                {
                    indexClient.DeleteIndex(configuration["IndexName"]);
                }
                catch { } // if we fail, it's most likely that the index has already been removed, just keep going.  If it's something else, we'll fail below

                SearchIndexerClient indexerClient = new SearchIndexerClient(endpoint, credential);
                // fully delete the indexer
                try
                {
                    indexerClient.DeleteIndexer(configuration["IndexerName"]);
                }
                catch { } // if we fail, it's most likely that the index has already been removed, just keep going.  If it's something else, we'll fail below

                // fully delete the data source connection
                try
                {
                    indexerClient.DeleteDataSourceConnection(configuration["DataSourceConnectionName"]);
                }
                catch { } // if we fail, it's most likely that the data source connection has already been removed, just keep going.  If it's something else, we'll fail below

                // fully delete the skillset
                try
                {
                    indexerClient.DeleteSkillset(configuration["SkillsetName"]);
                }
                catch { } // if we fail, it's most likely that the skillset has already been removed, just keep going.  If it's something else, we'll fail below

                SearchIndexerDataSourceConnection dataSourceConnection = SearchConstructs.Indexer.CreateBlobDataSourceConnection(configuration["DataSourceConnectionName"], 
                    configuration["AzureBlobConnectionString"], configuration["BlobContainer"]);

                // Create or update the index
                SearchIndex index = SearchConstructs.Index.CreateIndex(configuration["IndexName"]);

                // Create or update the skillset
                SearchIndexerSkillset skillset = SearchConstructs.Skillset.CreateSkillset(configuration["SkillsetName"]);

                indexClient.CreateOrUpdateIndex(index);
                indexerClient.CreateOrUpdateDataSourceConnection(dataSourceConnection);
                indexerClient.CreateOrUpdateSkillset(skillset);

                // Create or update the indexer
                SearchIndexer indexer = SearchConstructs.Indexer.CreateIndexer(configuration["IndexerName"], configuration["DataSourceConnectionName"], 
                    configuration["IndexName"], configuration["SkillsetName"]);

                // we deleted instead of just resetting, as we sometimes change the index/indexer definitions.
                // if we had a real system, this would not delete anything, but just call reset
                // really, you would need two separate functions, one for changes and one for not
                //    await indexerClient.CreateOrUpdateIndexerAsync(indexer);
                //    await indexerClient.ResetIndexerAsync(configuration["IndexerName"]);

                indexerClient.CreateIndexer(indexer);

                // creating an indexer, also runs it
                //indexerClient.RunIndexer(configuration["IndexerName"]);

                // Get and report the Search Service statistics
                Response<SearchServiceStatistics> stats = await indexClient.GetServiceStatisticsAsync();

                return new OkObjectResult($"{{\"indicesUsed\":\"{stats.Value.Counters.IndexCounter.Usage}\",\"indicesQuota\":\"{stats.Value.Counters.IndexCounter.Quota}\"}}");
            }
            catch(Exception x)
            {
                err = x;
            }

            return new ExceptionResult(err, true);
        }
    }
}

Environment: Azure.SearchDocuments 11.2.0-beta.2

dotnet --info .NET SDK (reflecting any global.json): Version: 5.0.102 Commit: 71365b4d42

Runtime Environment: OS Name: Windows OS Version: 10.0.19042 OS Platform: Windows RID: win10-x64 Base Path: C:\Program Files\dotnet\sdk\5.0.102\

Host (useful for support): Version: 5.0.2 Commit: cb5f173b96

.NET SDKs installed: 1.1.13 [C:\Program Files\dotnet\sdk] 1.1.14 [C:\Program Files\dotnet\sdk] 2.1.617 [C:\Program Files\dotnet\sdk] 2.1.700 [C:\Program Files\dotnet\sdk] 2.1.701 [C:\Program Files\dotnet\sdk] 2.1.812 [C:\Program Files\dotnet\sdk] 2.2.300 [C:\Program Files\dotnet\sdk] 3.1.300 [C:\Program Files\dotnet\sdk] 3.1.405 [C:\Program Files\dotnet\sdk] 5.0.102 [C:\Program Files\dotnet\sdk]

.NET runtimes installed: Microsoft.AspNetCore.All 2.1.11 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All] Microsoft.AspNetCore.All 2.1.12 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All] Microsoft.AspNetCore.All 2.1.24 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All] Microsoft.AspNetCore.All 2.2.5 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All] Microsoft.AspNetCore.App 2.1.11 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 2.1.12 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 2.1.24 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 2.2.5 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 3.1.11 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 5.0.2 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.NETCore.App 1.0.15 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 1.0.16 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 1.1.12 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 1.1.13 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.1.11 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.1.12 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.1.24 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.2.5 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 3.1.11 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 5.0.2 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.WindowsDesktop.App 3.1.11 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 5.0.2 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]

To install additional .NET runtimes or SDKs: https://aka.ms/dotnet-download

  • IDE and version : VS Enterprise 16.8.4, but the function returns in ~20 seconds. The problematic performance occurs within the indexer run in azure

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:12 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
brjohnstmsftcommented, Jun 9, 2021

@AlexGhiondea @Mohit-Chakraborty For indexer-related issues, please engage @bleroy

0reactions
bleroycommented, Oct 1, 2021

Not repro any more.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Indexer errors and warnings - Azure Cognitive Search
This article provides information and solutions to common errors and warnings you might encounter during indexing and AI enrichment in Azure ...
Read more >
Indexer troubleshooting guidance - Azure Cognitive Search
This article provides indexer problem and resolution guidance for cases when no error messages are returned from the service search.
Read more >
Run or reset indexers - Azure Cognitive Search
Run indexers in full, or reset an indexer, skills, or individual documents to refresh all or part of a search index or knowledge...
Read more >
Analyze performance - Azure Cognitive Search
This article describes the tools, behaviors, and approaches for analyzing query and indexing performance in Cognitive Search.
Read more >
Index large data sets in Azure Cognitive Search
If your search solution requirements include indexing big data or complex data, this article describes the strategies for accommodating long ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found