[Question] Cannot use non-ASCII chars in metadata?
See original GitHub issueDescribe the bug It is not possible to write blob metadata containing non-ASCII characters.
Also affects Microsoft Azure Storage SDK for .NET (11.0.0). https://github.com/Azure/azure-storage-net/issues/975
Expected behavior You should be able to save blob metadata containing non-ASCII characters.
Actual behavior (include Exception or Stack Trace) System.AggregateException: ‘Retry failed after 6 tries.4.0,(.NET Core 3.1.1; Microsoft Windows 10.0.18363)’
RequestFailedException: Request headers must contain only ASCII characters.
This exception was originally thrown at this call stack:
System.Net.Http.HttpConnection.WriteStringAsync(string)
System.Net.Http.HttpConnection.WriteHeadersAsync(System.Net.Http.Headers.HttpHeaders, string)
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(System.Threading.Tasks.Task)
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task)
System.Runtime.CompilerServices.ConfiguredTaskAwaitable.ConfiguredTaskAwaiter.GetResult()
System.Net.Http.HttpConnection.SendAsyncCore(System.Net.Http.HttpRequestMessage, System.Threading.CancellationToken)
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(System.Threading.Tasks.Task)
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task)
...
[Call Stack Truncated]
To Reproduce Write metadata to a blob containing a non-ASCII character (such as “ñ”).
using System;
using System.Collections.Generic;
using Azure.Storage.Blobs;
namespace BlobExperiment
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Hello World!");
const string connectionString =
"DefaultEndpointsProtocol=https;AccountName=<account-name>;AccountKey=<account-key>;EndpointSuffix=core.windows.net";
var client = new BlobContainerClient(connectionString, "test");
var blobClient = client.GetBlobClient("test.jpg");
var metadata = new Dictionary<string, string>();
metadata.Add("Test", "ñ"); // The notorious ñ.
blobClient.SetMetadata(metadata);
}
}
}
Environment:
- Azure.Storage.Blobs 12.4.0
- Windows 10, .NET Core 3.1.101
- Visual Studio 16.4.6
Issue Analytics
- State:
- Created 4 years ago
- Comments:9 (2 by maintainers)
Top Results From Across the Web
AWS SDK .NET custom metadata with non-ASCII characters
1 Answer. You cannot use non-ASCII characters for S3 user-defined metadata when using either REST API or the AWS SDK (since AWS SDK...
Read more >metadata encoding/decoding problem with non-ASCII ... - GitLab
This bug is very annoying for people using 8-bit character sets: One umlaut becomes two different characters. Another example is "Glühbirne" (light bulb) ......
Read more >Known problems and solutions for metadata import issues
In the Import Metadata workspace, when you use the ODBC connector to import a data source where the metadata contains non-ASCII characters, the...
Read more >Cannot use non-ascii letters in disutils if setuptools is used.
The official supported way for non-ASCII characters in distutils is to use Unicode strings. If anything else fails, that's not a bug.
Read more >How to handle special characters in metadata?
I have an implementation in C# to call ACS webservices for a long time. Last week we realized that some special characters in...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I hit this issue also in the context of updating blob metadata for Azure Cognitive Search. It was a headache to debug and find a fix, but
metadataDictionary.Add("myKey", Uri.EscapeDataString(metadata_value))
followed byblobClient.SetMetadata(metadataDictionary)
seems to work for pretty much all Unicode content. The encoding gets removed when finally stored in the metadata and the content looks fine (character accents visible, etc.) when viewing the uploaded metadata from Storage Explorer.One caveat is that you can still end up uploading metadata values that have trailing spaces (this is not permitted and an warning is displayed in storage explorer when viewing the metadata). Before figuring out how to encode, having trailing space was raising a very cryptic error from SetMetadata:
It would be helpful to improve the docs and the exception error messages above and below to suggest encoding the metadata values. Even better, the library would do the encoding for you. Having an additional method or adding an optional parameter to the existing method with an option to escapeStrings=true would help developers learn the expectations of the API.
A related issue is the following which is regarding the lack of warning that filenames need to be escaped as well. The default behavior of the API should be revisited for all functions generating http headers that might break in this way and developers need to be made well aware through docs, function comments and/or optional method parameters if they need to do the encoding themselves.
BlobClient doesn’t properly escape filenames with certain characters such as ‘#’. https://github.com/Azure/azure-sdk-for-net/issues/11602
You are correct; application metadata is sent to the service via HTTP request headers.
As Azure Blob Storage is fundamentally exposed as an HTTP service, the HTTP body content is used to represent the contents of a blob while HTTP headers are used to expose the properties and metadata of a blob. Using the HTTP headers in this way to carry properties/metadata is by design, and enables the following scenarios:
Blob properties and application metadata are thus designed so they can be accessed alongside blobs via a standard HTTP interface. Using the HTTP request body to carry metadata would not support these important scenarios.
As noted earlier, the use of HTTP headers does imply certain restrictions on the supported character set for metadata, as they must be carried in an HTTP header. To conform with the behavior of most HTTP clients and the recommendation from the RFC, metadata is restricted to US-ASCII octets. (Note that metadata names are further restricted, as documented here). From the RFC:
If you want to use the application metadata, any special characters will need to be encoded. Both URL-encoding or Base64-encoding would be acceptable; the choice would be up to the application (and in your case, whether Cognitive Search supports decoding metadata in this way, which I suspect they do not).
The option I would recommend, though, would be to store the JSON metadata document alongside the original blob. For example, image.jpg could be accompanied by image.jpg.meta, which would store its metadata in a separate JSON blob.
Cognitive Search has good support for indexing blobs containing JSON. You can create a metadata field, e.g.
metadata_original_blob_uri
, that points back to the real file, and mark this as “Retrievable” (but not “Searchable”) in the indexer settings. This will give you full search over all the metadata, and your application can then retrieve the original URI to the source blob when you have a search match.https://docs.microsoft.com/en-us/azure/search/search-howto-index-json-blobs