Continuous EventHub webjob fails with StorageException (TimeoutException) and holds up subsequent processing.
See original GitHub issueI have a similar error to issue #953, but I do not believe it is related to CPU load (our CPU load is always low). I believe it has to do with delays or overloads perhaps with the Storage Blob connection for the Dashboard logging information.
I have a few web jobs that consume event hub messages in batches. At times, there are delays to messages being consumed, and I find these TimeoutException errors in the diagnostic log. Several every hour.
After some analysis over the past week, I believe it is caused by the mechanisms that upload webjob dashboard data. I disabled all dashboard interaction by setting the JobHostConfiguration.DashboardConnectionString to null. Since I did this for all my webjobs, I have had no further time delay gaps in the processing of the event hub messages.
I am using the 2.00-beta2 versions of the WebJob SDK and Service Bus package. All non-webjob-SDK nuget packages are the latest production release as of 1/18/2017.
This is the declaration for the EventHubTrigger-based WebJob function:
public static async Task ProcessGeoTraqrEventHubMessagesAsync (
[EventHubTrigger("GeoTraqrEventHub")] EventData[] geoTraqrEventDatas,
CancellationToken cancellationToken);
The following CSV log file excerpt shows the exception detail for when the TimeoutException occurs:
date,level,applicationName,instanceId,eventTickCount,eventId,pid,tid,message,activityId
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,Executed: 'GeoTraqrProcessor.ProcessGeoTraqrEventHubMessagesAsync' (Failed),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,Microsoft.WindowsAzure.Storage.StorageException: The client could not finish the operation within specified timeout. ---> System.TimeoutException: The client could not finish the operation within specified timeout.,
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39, --- End of inner exception stack trace ---,
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39, at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.EndExecuteAsync[T](IAsyncResult result),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39, at Microsoft.WindowsAzure.Storage.Blob.CloudBlobContainer.EndCreate(IAsyncResult asyncResult),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39, at Microsoft.WindowsAzure.Storage.Blob.CloudBlobContainer.EndCreateIfNotExists(IAsyncResult asyncResult),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39, at Microsoft.WindowsAzure.Storage.Core.Util.AsyncExtensions.<>c__DisplayClass1`1.<CreateCallback>b__0(IAsyncResult ar),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,--- End of stack trace from previous location where exception was thrown ---,
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39, at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39, at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39, at Microsoft.Azure.WebJobs.Host.Protocols.PersistentQueueWriter`1.<EnqueueAsync>d__0.MoveNext(),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,--- End of stack trace from previous location where exception was thrown ---,
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39, at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39, at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39, at Microsoft.Azure.WebJobs.Host.Loggers.CompositeFunctionInstanceLogger.<LogFunctionStartedAsync>d__0.MoveNext(),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,--- End of stack trace from previous location where exception was thrown ---,
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39, at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39, at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39, at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39, at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<ExecuteWithLoggingAsync>d__1a.MoveNext(),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,--- End of stack trace from previous location where exception was thrown ---,
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39, at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39, at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39, at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<TryExecuteAsync>d__2.MoveNext(),
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,Request Information,
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,RequestID:,
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,RequestDate:,
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,StatusMessage:,
2017-01-19T14:03:20,Information,my-application-name,6d3b73,636204314003508575,0,12268,39,,
The stack trace indicates that it is entirely SDK contained code, most likely executing after the triggered webjob function has completed. I know that several blobs are updated and/or created in the dashboard every time a function executes. With the traffic that we get in our event hubs, these jobs have a function execution occurring several times every second at times, usually at least once every second give or take a few ms.
So now without the Dashboard storage connection, everything works great. At least as far as the event hub message processing jobs are concerned. This isn’t an ideal solution, however, basically shutting off access to dashboard storage. The primary problem will be with the continuous webjob I have that runs on a timer trigger and is also marked as a Singleton. I know that the blob-based lock for the singleton uses the dashboard storage connection, so that won’t work if I scale horizontally. For now I will have to use the database locking system and custom code the singular instance enforcement for the timer triggered job function. As I mentioned, not an ideal solution.
Issue Analytics
- State:
- Created 7 years ago
- Reactions:2
- Comments:13 (2 by maintainers)
Top GitHub Comments
I was having the same problem with Azure Functions (built on WebJobs) until I switched to a more expensive VM that runs my job with more than 1 core. This way the health monitoring is not blocked by the work my job does. I would argue that since web jobs architecture requires health monitoring, you should pay for (and automatically configure) that extra core to get health information back from my job instead of me having to worry about it.
@ArnimSchinz
How about have a test by,
write the following code at the beginning of your app, or at least before calling JobHost.RunAndBlock(if you calls it),
Then test it.