question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CloudBlockBlob.DownloadText() handles UTF8 BOM incorrectly

See original GitHub issue

CloudBlockBlob.DownloadText() behaves differently than File.ReadAllText in respect to UTF8 pre-amble/BOM

Repro: Create an XML File in Visual Studio and upload it to a Cloud Blob container. The file will begin with a BOM (EF BB BF). Then download it using CloudBlockBlob.DownloadText() and pass the resulting string to XDocument.Parse. The parser will fail with XMLException - “Data at the root level is invalid. Line 1, position 1.”. Failing code:

var storageAccount = CloudStorageAccount.Parse("connectionString");
var blobClient = storageAccount.CreateCloudBlobClient();
var container = blobClient.GetContainerReference("MyContainer");
var blob = container.GetBlockBlobReference("my.xml");
var s = blob.DownloadText();
var x = XDocument.Parse(s);

A workaround suggested at http://stackoverflow.com/questions/2111586/parsing-xml-string-to-an-xml-document-fails-if-the-string-begins-with-xml by Dave Cluderay suggests passing the read string through StreamReader. Working code

        var storageAccount = CloudStorageAccount.Parse("connectionString");
        var blobClient = storageAccount.CreateCloudBlobClient();
        var container = blobClient.GetContainerReference("MyContainer");
        var blob = container.GetBlockBlobReference("my.xml");
        var s = blob.DownloadText();
        using (var memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(s)))
        {
            using (var streamReader = new StreamReader(memoryStream))
            {
                var x = XDocument.Load(streamReader);
            }
        }

Moved from https://github.com/Azure/azure-sdk-for-net/issues/626#

Issue Analytics

  • State:closed
  • Created 9 years ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

8reactions
guardrexcommented, Dec 19, 2015

@joefeser 👍 I’m way late to the party, but …

We could detect and strip out the BOM ourselves, but this is a potentially surprising transformation of the blob’s original data

It would have been nice though to have an overload .DownloadText(bool stripByteOrderMark), so that the original method would just do the un-“surprising” thing and the overload would allow the dev to explicitly call for stripping the BOM.

2reactions
r-aghaeicommented, Mar 6, 2018

I created an extension method DownloadString for CloudBlockBlob and used the same code which is used in WebClient.DownloadString. Here you can find the code: How to get rid of BOM when downloading text from azure blob

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to get rid of BOM when downloading text from azure blob
When you download a text content using CloudBlockBlob.DownloadText() if the blob contains the BOM or byte order mark, then the returned text ...
Read more >
Adding UTF-8 BOM to string/Blob - javascript
I need to add a UTF-8 byte-order-mark to generated text data on client side. How do I do that? Using new Blob(['\xEF\xBB\xBF' +...
Read more >
Bom excel byte order mark
It's working fine, but, after the files have been moved, the UTF-8 BOM is missing in all ... DownloadText() if the blob contains...
Read more >
Azure Data Flow utf-8 bom is wrong.
I want to export a csv file with BOM like above. But I get the file without BOM like below. Using hexdump to...
Read more >
Why is my UTF-8 document raising UTF-8 encoding errors ...
If the encoding indicator instead showed "UTF-8-BOM", then that would be a guarantee that the file was encoded as UTF-8.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found