question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Continuous EventHub webjob fails with StorageException (TimeoutException) and holds up subsequent processing.

See original GitHub issue

I have a similar error to issue #953, but I do not believe it is related to CPU load (our CPU load is always low). I believe it has to do with delays or overloads perhaps with the Storage Blob connection for the Dashboard logging information.

I have a few web jobs that consume event hub messages in batches. At times, there are delays to messages being consumed, and I find these TimeoutException errors in the diagnostic log. Several every hour.

After some analysis over the past week, I believe it is caused by the mechanisms that upload webjob dashboard data. I disabled all dashboard interaction by setting the JobHostConfiguration.DashboardConnectionString to null. Since I did this for all my webjobs, I have had no further time delay gaps in the processing of the event hub messages.

I am using the 2.00-beta2 versions of the WebJob SDK and Service Bus package. All non-webjob-SDK nuget packages are the latest production release as of 1/18/2017.

This is the declaration for the EventHubTrigger-based WebJob function:

        public static async Task ProcessGeoTraqrEventHubMessagesAsync (
            [EventHubTrigger("GeoTraqrEventHub")] EventData[] geoTraqrEventDatas,
            CancellationToken cancellationToken);

The following CSV log file excerpt shows the exception detail for when the TimeoutException occurs:

date,level,applicationName,instanceId,eventTickCount,eventId,pid,tid,message,activityId
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,Executed: 'GeoTraqrProcessor.ProcessGeoTraqrEventHubMessagesAsync' (Failed),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,Microsoft.WindowsAzure.Storage.StorageException: The client could not finish the operation within specified timeout. ---> System.TimeoutException: The client could not finish the operation within specified timeout.,
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,   --- End of inner exception stack trace ---,
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,   at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.EndExecuteAsync[T](IAsyncResult result),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,   at Microsoft.WindowsAzure.Storage.Blob.CloudBlobContainer.EndCreate(IAsyncResult asyncResult),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,   at Microsoft.WindowsAzure.Storage.Blob.CloudBlobContainer.EndCreateIfNotExists(IAsyncResult asyncResult),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,   at Microsoft.WindowsAzure.Storage.Core.Util.AsyncExtensions.<>c__DisplayClass1`1.<CreateCallback>b__0(IAsyncResult ar),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,--- End of stack trace from previous location where exception was thrown ---,
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,   at Microsoft.Azure.WebJobs.Host.Protocols.PersistentQueueWriter`1.<EnqueueAsync>d__0.MoveNext(),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,--- End of stack trace from previous location where exception was thrown ---,
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,   at Microsoft.Azure.WebJobs.Host.Loggers.CompositeFunctionInstanceLogger.<LogFunctionStartedAsync>d__0.MoveNext(),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,--- End of stack trace from previous location where exception was thrown ---,
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,   at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<ExecuteWithLoggingAsync>d__1a.MoveNext(),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,--- End of stack trace from previous location where exception was thrown ---,
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<TryExecuteAsync>d__2.MoveNext(),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,Request Information,
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,RequestID:,
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,RequestDate:,
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,StatusMessage:,
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,,

The stack trace indicates that it is entirely SDK contained code, most likely executing after the triggered webjob function has completed. I know that several blobs are updated and/or created in the dashboard every time a function executes. With the traffic that we get in our event hubs, these jobs have a function execution occurring several times every second at times, usually at least once every second give or take a few ms.

So now without the Dashboard storage connection, everything works great. At least as far as the event hub message processing jobs are concerned. This isn’t an ideal solution, however, basically shutting off access to dashboard storage. The primary problem will be with the continuous webjob I have that runs on a timer trigger and is also marked as a Singleton. I know that the blob-based lock for the singleton uses the dashboard storage connection, so that won’t work if I scale horizontally. For now I will have to use the database locking system and custom code the singular instance enforcement for the timer triggered job function. As I mentioned, not an ideal solution.

Issue Analytics

  • State:open
  • Created 7 years ago
  • Reactions:2
  • Comments:13 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
lovettchriscommented, Nov 17, 2020

I was having the same problem with Azure Functions (built on WebJobs) until I switched to a more expensive VM that runs my job with more than 1 core. This way the health monitoring is not blocked by the work my job does. I would argue that since web jobs architecture requires health monitoring, you should pay for (and automatically configure) that extra core to get health information back from my job instead of me having to worry about it.

1reaction
bz0108commented, Jul 31, 2017

@ArnimSchinz

How about have a test by,

write the following code at the beginning of your app, or at least before calling JobHost.RunAndBlock(if you calls it),

        System.Threading.ThreadPool.SetMinThreads(200, 200);
        System.Net.ServicePointManager.DefaultConnectionLimit = 1920;

Then test it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Long running webjob fail due to Azure Storage timeout
The issue is with a long running azure webjob on a daily schedule. Each run takes 2-4 hours doing data analytics. The only...
Read more >
EventHub trigger errors "Microsoft.Azure. ...
We are seeing these errors pop up across many different Azure functions with Event Hub triggers. We have not made any code changes....
Read more >
Server failed to authenticate the request
Basically your code was failing because you were using Encoding.UTF8.GetBytes(accountKey). We would need to use Convert.FromBase64String(accountKey).
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found