[BUG] Cognitive Search Indexer Performance with Azure.Search.Documents
See original GitHub issueDescribe the bug Using Azure.Search.Documents 11.2.0-beta.2, I can add 4 individual files to be indexed, and it takes about 10 seconds a file. If I recreate the index and indexer against a storage account with all 4 files present, it takes over 7 minutes to complete indexing. There are no errors given.
Expected behavior I would expect the time to index to be roughly the same whether the files are added one at a time, or are present for the initial indexing. This is way out of the grey area. I tried with 10 files and it took 20 minutes.
Actual behavior (include Exception or Stack Trace) No errors are produced. The indexer just runs forever, and then eventually succeeds. This is for a demo application, so indexing is controlled through two azure functions. The second is bound to blob storage and causes the indexer to update when a new file is added. The first accepts an http command to start. It then deletes the current index, indexer, storageconnection, and skillset and completely recreates everything
To Reproduce Steps to reproduce the behavior (include a code snippet, screenshot, or any additional information that might help us reproduce the issue)
Here is the problematic function, that does the full reindexing on command:
using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using Microsoft.AspNetCore.Http;
using Microsoft.Extensions.Logging;
using Newtonsoft.Json;
using Microsoft.Extensions.Configuration;
using Azure;
using Azure.Search.Documents.Indexes;
using Azure.Search.Documents.Indexes.Models;
using SearchConstructs;
using System.Linq;
using System.Web.Http;
using Microsoft.Extensions.Azure;
namespace CogSearchIndexing
{
public static class ResetIndexer
{
[FunctionName("ResetIndexer")]
public static async Task<IActionResult> Run(
[HttpTrigger(AuthorizationLevel.Function, "post", Route = null)] HttpRequest req,
ILogger log, ExecutionContext context)
{
log.LogInformation("ResetIndexer is recreating the search index");
var configuration = new ConfigurationBuilder()
.SetBasePath(context.FunctionAppDirectory)
.AddJsonFile("local.settings.json", optional: true, reloadOnChange: true)
.AddEnvironmentVariables()
.Build();
Exception err;
try
{
// Create a new SearchIndexClient
Uri endpoint = new Uri(configuration["SearchServiceUri"]);
AzureKeyCredential credential = new AzureKeyCredential(configuration["SearchServiceAdminApiKey"]);
SearchIndexClient indexClient = new SearchIndexClient(endpoint, credential);
// delete synchronously, as there is some issue with the indexer running for forever and a day, let's see if we ensure deltion on the same thread...
// fully delete the index
try
{
indexClient.DeleteIndex(configuration["IndexName"]);
}
catch { } // if we fail, it's most likely that the index has already been removed, just keep going. If it's something else, we'll fail below
SearchIndexerClient indexerClient = new SearchIndexerClient(endpoint, credential);
// fully delete the indexer
try
{
indexerClient.DeleteIndexer(configuration["IndexerName"]);
}
catch { } // if we fail, it's most likely that the index has already been removed, just keep going. If it's something else, we'll fail below
// fully delete the data source connection
try
{
indexerClient.DeleteDataSourceConnection(configuration["DataSourceConnectionName"]);
}
catch { } // if we fail, it's most likely that the data source connection has already been removed, just keep going. If it's something else, we'll fail below
// fully delete the skillset
try
{
indexerClient.DeleteSkillset(configuration["SkillsetName"]);
}
catch { } // if we fail, it's most likely that the skillset has already been removed, just keep going. If it's something else, we'll fail below
SearchIndexerDataSourceConnection dataSourceConnection = SearchConstructs.Indexer.CreateBlobDataSourceConnection(configuration["DataSourceConnectionName"],
configuration["AzureBlobConnectionString"], configuration["BlobContainer"]);
// Create or update the index
SearchIndex index = SearchConstructs.Index.CreateIndex(configuration["IndexName"]);
// Create or update the skillset
SearchIndexerSkillset skillset = SearchConstructs.Skillset.CreateSkillset(configuration["SkillsetName"]);
indexClient.CreateOrUpdateIndex(index);
indexerClient.CreateOrUpdateDataSourceConnection(dataSourceConnection);
indexerClient.CreateOrUpdateSkillset(skillset);
// Create or update the indexer
SearchIndexer indexer = SearchConstructs.Indexer.CreateIndexer(configuration["IndexerName"], configuration["DataSourceConnectionName"],
configuration["IndexName"], configuration["SkillsetName"]);
// we deleted instead of just resetting, as we sometimes change the index/indexer definitions.
// if we had a real system, this would not delete anything, but just call reset
// really, you would need two separate functions, one for changes and one for not
// await indexerClient.CreateOrUpdateIndexerAsync(indexer);
// await indexerClient.ResetIndexerAsync(configuration["IndexerName"]);
indexerClient.CreateIndexer(indexer);
// creating an indexer, also runs it
//indexerClient.RunIndexer(configuration["IndexerName"]);
// Get and report the Search Service statistics
Response<SearchServiceStatistics> stats = await indexClient.GetServiceStatisticsAsync();
return new OkObjectResult($"{{\"indicesUsed\":\"{stats.Value.Counters.IndexCounter.Usage}\",\"indicesQuota\":\"{stats.Value.Counters.IndexCounter.Quota}\"}}");
}
catch(Exception x)
{
err = x;
}
return new ExceptionResult(err, true);
}
}
}
Environment: Azure.SearchDocuments 11.2.0-beta.2
dotnet --info .NET SDK (reflecting any global.json): Version: 5.0.102 Commit: 71365b4d42
Runtime Environment: OS Name: Windows OS Version: 10.0.19042 OS Platform: Windows RID: win10-x64 Base Path: C:\Program Files\dotnet\sdk\5.0.102\
Host (useful for support): Version: 5.0.2 Commit: cb5f173b96
.NET SDKs installed: 1.1.13 [C:\Program Files\dotnet\sdk] 1.1.14 [C:\Program Files\dotnet\sdk] 2.1.617 [C:\Program Files\dotnet\sdk] 2.1.700 [C:\Program Files\dotnet\sdk] 2.1.701 [C:\Program Files\dotnet\sdk] 2.1.812 [C:\Program Files\dotnet\sdk] 2.2.300 [C:\Program Files\dotnet\sdk] 3.1.300 [C:\Program Files\dotnet\sdk] 3.1.405 [C:\Program Files\dotnet\sdk] 5.0.102 [C:\Program Files\dotnet\sdk]
.NET runtimes installed: Microsoft.AspNetCore.All 2.1.11 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All] Microsoft.AspNetCore.All 2.1.12 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All] Microsoft.AspNetCore.All 2.1.24 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All] Microsoft.AspNetCore.All 2.2.5 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All] Microsoft.AspNetCore.App 2.1.11 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 2.1.12 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 2.1.24 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 2.2.5 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 3.1.11 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 5.0.2 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.NETCore.App 1.0.15 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 1.0.16 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 1.1.12 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 1.1.13 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.1.11 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.1.12 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.1.24 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.2.5 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 3.1.11 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 5.0.2 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.WindowsDesktop.App 3.1.11 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 5.0.2 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
To install additional .NET runtimes or SDKs: https://aka.ms/dotnet-download
- IDE and version : VS Enterprise 16.8.4, but the function returns in ~20 seconds. The problematic performance occurs within the indexer run in azure
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (8 by maintainers)
Top GitHub Comments
@AlexGhiondea @Mohit-Chakraborty For indexer-related issues, please engage @bleroy
Not repro any more.