Thread safety and suggested lifetime patterns of storage classes not documented
See original GitHub issueThe docs (e.g., https://docs.microsoft.com/en-us/dotnet/api/microsoft.windowsazure.storage.cloudstorageaccount) don’t mention anything about the thread safety or suggested lifetime/usage patterns of CloudStorageAccount
, CloudBlobClient
, CloudBlobContainer
, etc.
In a server application, how are those supposed to be created and retained? E.g., CloudStorageAccount
and CloudBlobClient
instances as global singletons, CloudBlobContainer
per call? The sample at https://github.com/Azure-Samples/storage-blobs-dotnet-webapp/blob/master/WebApp-Storage-DotNet/Controllers/HomeController.cs instantiates all these classes per call - is this the suggested usage or is it actually expensive to reinstantiate them every time?
(The “Cloud_Storage_Account_Sample” link at https://docs.microsoft.com/en-us/dotnet/api/microsoft.windowsazure.storage.cloudstorageaccount?view=azure-dotnet doesn’t work, BTW, but even if it did, the sample doesn’t show how to use those classes in production.)
Issue Analytics
- State:
- Created 5 years ago
- Reactions:63
- Comments:56 (8 by maintainers)
Top GitHub Comments
Well, a lot of +1, a non-trivial number of watchers and several brief comments asking for help aren’t prompting a response so here goes …
Nothing in the original issue report has yet been addressed in all this discussion over the last 16 months.
Breaking this issue down:
I agree with the following:
What documentation do we currently have, if any?
If we look at Azure / Function / How-to guides / Develop / Manage Connections / Static Clients it says what @moshegutman quoted previously (version-specific link):
but it also says:
Which in turn says (version-specific link):
There is no mention of storage classes and whereas
QueueClient
actually has got documentation on thread-safety and storage patterns (if you can stumble across it), storage broker classes don’t seem to have corresponding documentation even under the equivalent documentation section.That last page then goes on to say:
Finally we have the cited sample code which is just plain bad code and no help at all. (Overwriting a
static
field on every call toIndex
??? 😱 ).Why is this documentation not helping?
So here’s the kicker: this issue is all about the fact that (other than one throw-away reference hidden in the azure functions documentation with no supporting documentation in the actual documentation for the relevant classes):
CloudBlobClient
or the newer v12BlobServiceClient
etc.; andOther asides in this thread.
To revisit @kfarmer-msft (well-meaning) comment:
I respectfully, but firmly, take the view that this isn’t answering the issue. This may answer the OP’s “is it actually expensive to reinstantiate them every time” question, but not the underlying problem. It may be a performance optimisation to work out if object re-use can improve performance, but the other aspect is that it is fundamental to building a basic application to know how to use objects that own and use expensive network resources in order to address scaling and to prevent resource starvation. 1
If we are forced to do what you recommend then we are faced with the choice of either (a) never getting a project off the ground as it will become mired in blackbox testing; or (b) risking immediate failures on deployment to production . This is the exact reason why we now have
HttpClient
andHttpClientFactory
with authoritative guidance on how to use them. There is only one way to useHttpClient
, not different ways you choose depending on your best guess of usage patterns, performance and expected scale.What are our current options, as customers of the storage SDK?
The problem with the current lack of documentation is that we are doomed if we do and doomed if we don’t:
TL;DR:
There is no guidance for thread-safety for critical classes in the Azure SDK like
CloudBlobClient
orCloudBlobContainer
or the newer v12BlobServiceClient
etc… We need a documentation point for each critical type on each of: thread-safety documentation; concurrent async calls; and lifetime recommendations. At the very least we need this guidance for the azure storage classes in a page like this one, (although this might be better placed at the type and member level for azure storage SDK classes and it would be nice to have a consistent approach across all the azure SDKs).Footnote: When I want to know how to use a type I expect to go to the [type’s documentation page like
BlobContainerClient
orCloudBlobClient
instead of having to stumble across unrelated pages like azure functions documentation. At this point in time most of those type documentation pages offer nothing beyond simple function overload parameter intellisense in the IDE. A thread-safety/concurrency section for types/members as appropriate, and “lifetime and usage pattern” recommendation (singleton, pooled, transient) section for types would completely sort this issue out. It would also handle consistent documentation for legacy versions after major-version, namespace and even assembly name/nuget changes.† Even
Microsoft.Extensions.ObjectPool
doesn’t handle things likeIDisposable
orIDisposableAsync
, or pooling objects by key, and building async-friendly, disposable friendly, keyed pools isn’t a trivial piece of work.‡Maybe they expire old connections in the background and don’t silently stand them up again? So you can get non-deterministic errors at runtime due to stale connections
1 Also it is not the role of a customer of an SDK to blackbox test every class for thread-safety, race conditions, and performance and adjust their usage of the SDK accordingly, and apart from anything else if we were all to do this we’d each be reinventing the wheel every time and it would be a massive waste of global resources)!
@kfarmer-msft My question arose from the fact that unit of work scoping (or per call scoping) is not the way an
HttpClient
should be used, as this pattern could cause port exhaustion if many units of work - e.g., web requests - are processed in a short time. So I checked the documentation for the Azure Storage clients and found - nothing.It’s cool that you recommend a unit of work pattern and that apparently this will minimize the creation of
HttpClient
instances at the moment, but I’m uneasy when you say - as I understand it - that you cannot guarantee this will hold for the future. I can’t build my application on a “one storage client per unit of work” pattern now and have it run into port exhaustion after updating the Azure client libraries.I have no specific performance question. My goal is to use the storage clients in a way that is capable of parallel processing/multi-threading and will not cause port exhaustion. I really think the Azure Storage team needs to document how I can reach this goal. (In the API documentation, not some random Github issue.)