question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Blob Trigger Scan continuously firing, and processing the same blobs

See original GitHub issue

Library name and version

Azure.Storage.Blobs 12.15.1

Describe the bug

I have a blob trigger set up to montitor one of my containers in Azure Storage. The associated scan over the blobs is firing non-stop, looping over the same blobs again and again, causing excessive operations on the storage account.

Expected behavior

Would expect blobs already processed to be skipped.

Actual behavior

A scan is firing immediately after the last one has finished. I’m not sure if that’s expected behaviour, but what is not expected, is that the same blobs are being detected by the scan over and over again.

I have noticed that at the start of each scan, this log message is printed to the function logs:

[Verbose] Poll for blobs newer than '0001-01-01T00:00:00.000' in container '<container>' with ClientRequestId 'xxx' found 1839 blobs in 302 ms. ContinuationToken: False

Comparing with other blob triggers, I have discovered that the relevant scaninfo file seems to not have been created/updated. If I manually create the scaninfo file with a date/time, then the next scan will poll for blobs newer than that time, but the scaninfo file is then not updated again automatically.

Blob receipts are being created, and each blob is processed with a message similar to the following:

[Verbose] Blob '<blob name>' will be skipped for function 'FillBlobMetaData' because this blob with ETag '"0x8D8798F7B192688"' has already been processed. PollId: '566d148e-11d0-4a80-86a0-c7d4cfab3b58'. Source: 'ContainerScan'.

Reproduction Steps

Enable my function within the function app on the Azure Portal.

Environment

.NET 6.0 app Azure Functions, runtime version 4.16.5.20396

Microsoft.Azure.WebJobs.Extensions.Storage.Blobs v5.1.1

Issue Analytics

  • State:open
  • Created 6 months ago
  • Reactions:2
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
alexsorokoletovcommented, May 24, 2023

+1 to this problem, also applies to Node.js based functions

0reactions
badbortcommented, Aug 15, 2023

I encountered the same issue with my dotnet function. I eventually tracked it down to the ScanBlobScanLogHybridPollingStrategy.PollNewBlobsAsync method.

This was causing millions of transactions per day

The blob paged api has a page continuation token value to determine if the current page is the last page. That token is an empty string for the last page, but the code was checking against null. This resulted in the date time never being assigned and was always DateTime.Min. Thus every blob get checked again and again against that date

I noticed someone else fixed the issue here: https://github.com/Azure/azure-sdk-for-net/commit/a69839f4a33ab39a7d1441e36415f08e00b76ff0

I think Microsoft.Azure.WebJobs.Extensions.Storage.Blobs 5.1.2 introduced the fix

edit: After switching to the latest version (5.1.3) I still found this polling mechanism continuously checks the most recent batch of files due to these lines:

https://github.com/Azure/azure-sdk-for-net/blob/c4fb4c52117622c2107163ba6d3efbb20743836d/sdk/storage/Microsoft.Azure.WebJobs.Extensions.Storage.Blobs/src/Listeners/ScanBlobScanLogHybridPollingStrategy.cs#L265-L268

Above CurrentSweepCycleLatestModified is set to the most recent last modified date. This becomes the LastSweepCycleLatestModified afterwards.

If there are no new files coming in, then LastSweepCycleLatestModified remains assigned to the last modified date of the most recent file - still unchanged.

As a result, those most recent files will be checked over and over again, continuously:

https://github.com/Azure/azure-sdk-for-net/blob/c4fb4c52117622c2107163ba6d3efbb20743836d/sdk/storage/Microsoft.Azure.WebJobs.Extensions.Storage.Blobs/src/Listeners/ScanBlobScanLogHybridPollingStrategy.cs#L270-L275

According to the comment earlier about rounding, it seems like you cant really mess with the logic.

In my case I had copied over a batch of 60 files at the same time (out of ~700 in the container), and they were continuously being checked every 10 seconds (PollingInterval)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Azure Function fires multiple times for the same Blob ...
Try to move the blobs which have triggered the function to another different container/directory after it triggers, Then see whether this ...
Read more >
BlobTrigger event fires Azure Function multiple times
Hi all, I have developed an azure function that run when a blob trigger event grid (blob created) is catched in an azure...
Read more >
Azure Blob storage trigger doesn't fire for large blobs
I have a blob-triggered Azure Function. It perfectly works with blobs up to 1.1-1.2GB (and less). But when I upload a 1.8GB file...
Read more >
Azure Function Blob Trigger Retry Count - appsloveworld.com
After a blob trigger function fails for the last time, Azure should write a message to a storage queue called webjobs-blobtrigger-poison .
Read more >
The lie behind Azure Blob Triggers | by 🔁🏇 Loop Jockey
Tip #1: If your blob trigger path matches the end of any blob path in your container then it will run for that...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found