[QUERY] Handling timeouts
See original GitHub issueQuery/Question Hi, I have an application that automatically creates an destroys some resources in Azure North Europe for use in automated tests. Over the past week we have had many problems with resources taking a long time or failing to create, presumably due to the high demand currently. We have also noticed that sometimes the SDK reports a timeout & throws but the resource does eventually get created. In these cases, we catch the exception & retry creating the resource, but because the original resource does eventually get created we end up with many more resources than we needed.
I’ve been looking at how we can increate timeouts, or otherwise handle this situation better. There are two things I want to try but don’t fully undersrand the implications of:
- Setting
IAzureClient.LongRunningOperationRetryTimeout
to a high value. - Setting
IAzureClient.HttpClient.Timeout
to a high value.
Would doing either of these (or a combination of both) mean that the SDK waited until ARM had actually created or failed to create the resource? I don’t mind resources taking a long time to create, but I want the process to be deterministic. Having our system think resources have failed to create, but then actually appear some time later is problematic. If there are other ways of handling this, and I’m looking at the wrong place, let me know!
Environment:
- Name and version of the Library package used: Microsoft.Azure.Management.Fluent (v1.31.0)
- Hosting platform or OS and .NET runtime version (
dotnet --info
output for .NET Core projects): Linux dotnet core 2.2.6 - IDE and version : Visual Studio 16.5.0
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (5 by maintainers)
Top GitHub Comments
RetryPolicy
is pretty complicated and hard to predict for different use cases, and hence only retry for 429 (too many requests) is included by default.And whether it is helpful or not could depend on the nature of the failure. E.g. if the response of a PUT does not get back to you, a
RetryPolicy
probably not going to help much.You can configure it, but do be careful (and you might want to exclude 501 and 505 (NOT_IMPLEMENTED and VERSION).
Setting
IAzureClient.LongRunningOperationRetryTimeou
is likely not related to the issue. It is the default retry-after value for LRO (long running operation). Usually service will have a value which override it.Setting
IAzureClient.HttpClient.Timeout
might help a bit. However I think ARM itself got a timeout about 1 or 2 minutes (https://github.com/Azure/azure-resource-manager-rpc/blob/master/v1.0/common-api-details.md#client-request-timeout), so value larger then 2 minutes probably will have no effect.Depends on the nature of the problem, adding a
RetryPolicy
might help, if it is actually package loss or timeout in polling phase (after resource provision accepted but not completed) of LRO.@yungezz