question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] Cannot use non-ASCII chars in metadata?

See original GitHub issue

Describe the bug It is not possible to write blob metadata containing non-ASCII characters.

Also affects Microsoft Azure Storage SDK for .NET (11.0.0). https://github.com/Azure/azure-storage-net/issues/975

Expected behavior You should be able to save blob metadata containing non-ASCII characters.

Actual behavior (include Exception or Stack Trace) System.AggregateException: ‘Retry failed after 6 tries.4.0,(.NET Core 3.1.1; Microsoft Windows 10.0.18363)’

RequestFailedException: Request headers must contain only ASCII characters.

This exception was originally thrown at this call stack:
	System.Net.Http.HttpConnection.WriteStringAsync(string)
	System.Net.Http.HttpConnection.WriteHeadersAsync(System.Net.Http.Headers.HttpHeaders, string)
	System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
	System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(System.Threading.Tasks.Task)
	System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task)
	System.Runtime.CompilerServices.ConfiguredTaskAwaitable.ConfiguredTaskAwaiter.GetResult()
	System.Net.Http.HttpConnection.SendAsyncCore(System.Net.Http.HttpRequestMessage, System.Threading.CancellationToken)
	System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
	System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(System.Threading.Tasks.Task)
	System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task)
    ...
    [Call Stack Truncated]

To Reproduce Write metadata to a blob containing a non-ASCII character (such as “ñ”).

using System;
using System.Collections.Generic;
using Azure.Storage.Blobs;

namespace BlobExperiment
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Hello World!");
            const string connectionString =
                "DefaultEndpointsProtocol=https;AccountName=<account-name>;AccountKey=<account-key>;EndpointSuffix=core.windows.net";

            var client = new BlobContainerClient(connectionString, "test");
            var blobClient = client.GetBlobClient("test.jpg");

            var metadata = new Dictionary<string, string>();
            metadata.Add("Test", "ñ"); // The notorious ñ.

            blobClient.SetMetadata(metadata);
        }
    }
}

Environment:

  • Azure.Storage.Blobs 12.4.0
  • Windows 10, .NET Core 3.1.101
  • Visual Studio 16.4.6

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
kganjamcommented, Apr 26, 2020

I hit this issue also in the context of updating blob metadata for Azure Cognitive Search. It was a headache to debug and find a fix, but metadataDictionary.Add("myKey", Uri.EscapeDataString(metadata_value)) followed by blobClient.SetMetadata(metadataDictionary) seems to work for pretty much all Unicode content. The encoding gets removed when finally stored in the metadata and the content looks fine (character accents visible, etc.) when viewing the uploaded metadata from Storage Explorer.

One caveat is that you can still end up uploading metadata values that have trailing spaces (this is not permitted and an warning is displayed in storage explorer when viewing the metadata). Before figuring out how to encode, having trailing space was raising a very cryptic error from SetMetadata:

—> AzureStorage Blob Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature

It would be helpful to improve the docs and the exception error messages above and below to suggest encoding the metadata values. Even better, the library would do the encoding for you. Having an additional method or adding an optional parameter to the existing method with an option to escapeStrings=true would help developers learn the expectations of the API.

—>Azure.RequestFailedException: Request headers must contain only ASCII characters. —> System.Net.Http.HttpRequestException: Request headers must contain only ASCII characters. at System.Net.Http.HttpConnection.WriteStringAsync(String s)

A related issue is the following which is regarding the lack of warning that filenames need to be escaped as well. The default behavior of the API should be revisited for all functions generating http headers that might break in this way and developers need to be made well aware through docs, function comments and/or optional method parameters if they need to do the encoding themselves.

BlobClient doesn’t properly escape filenames with certain characters such as ‘#’. https://github.com/Azure/azure-sdk-for-net/issues/11602

1reaction
ryanerdmanncommented, Apr 23, 2020

You are correct; application metadata is sent to the service via HTTP request headers.

As Azure Blob Storage is fundamentally exposed as an HTTP service, the HTTP body content is used to represent the contents of a blob while HTTP headers are used to expose the properties and metadata of a blob. Using the HTTP headers in this way to carry properties/metadata is by design, and enables the following scenarios:

  • Upload or download of blob contents (request body) and blob metadata (request headers) together in a single operation (HTTP PUT/GET on a blob resource)
  • Retrieval of metadata for a blob, independent of its contents, via an HTTP HEAD operation (which does not retrieve the body content)

Blob properties and application metadata are thus designed so they can be accessed alongside blobs via a standard HTTP interface. Using the HTTP request body to carry metadata would not support these important scenarios.

As noted earlier, the use of HTTP headers does imply certain restrictions on the supported character set for metadata, as they must be carried in an HTTP header. To conform with the behavior of most HTTP clients and the recommendation from the RFC, metadata is restricted to US-ASCII octets. (Note that metadata names are further restricted, as documented here). From the RFC:

Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets. A recipient SHOULD treat other octets in field content (obs-text) as opaque data. RFC 7230 3.2.4

If you want to use the application metadata, any special characters will need to be encoded. Both URL-encoding or Base64-encoding would be acceptable; the choice would be up to the application (and in your case, whether Cognitive Search supports decoding metadata in this way, which I suspect they do not).

The option I would recommend, though, would be to store the JSON metadata document alongside the original blob. For example, image.jpg could be accompanied by image.jpg.meta, which would store its metadata in a separate JSON blob.

Cognitive Search has good support for indexing blobs containing JSON. You can create a metadata field, e.g. metadata_original_blob_uri, that points back to the real file, and mark this as “Retrievable” (but not “Searchable”) in the indexer settings. This will give you full search over all the metadata, and your application can then retrieve the original URI to the source blob when you have a search match.

https://docs.microsoft.com/en-us/azure/search/search-howto-index-json-blobs

Read more comments on GitHub >

github_iconTop Results From Across the Web

AWS SDK .NET custom metadata with non-ASCII characters
1 Answer. You cannot use non-ASCII characters for S3 user-defined metadata when using either REST API or the AWS SDK (since AWS SDK...
Read more >
metadata encoding/decoding problem with non-ASCII ... - GitLab
This bug is very annoying for people using 8-bit character sets: One umlaut becomes two different characters. Another example is "Glühbirne" (light bulb) ......
Read more >
Known problems and solutions for metadata import issues
In the Import Metadata workspace, when you use the ODBC connector to import a data source where the metadata contains non-ASCII characters, the...
Read more >
Cannot use non-ascii letters in disutils if setuptools is used.
The official supported way for non-ASCII characters in distutils is to use Unicode strings. If anything else fails, that's not a bug.
Read more >
How to handle special characters in metadata?
I have an implementation in C# to call ACS webservices for a long time. Last week we realized that some special characters in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found