CloudBlockBlob.DownloadText() handles UTF8 BOM incorrectly
See original GitHub issueCloudBlockBlob.DownloadText() behaves differently than File.ReadAllText in respect to UTF8 pre-amble/BOM
Repro: Create an XML File in Visual Studio and upload it to a Cloud Blob container. The file will begin with a BOM (EF BB BF). Then download it using CloudBlockBlob.DownloadText() and pass the resulting string to XDocument.Parse. The parser will fail with XMLException - “Data at the root level is invalid. Line 1, position 1.”. Failing code:
var storageAccount = CloudStorageAccount.Parse("connectionString");
var blobClient = storageAccount.CreateCloudBlobClient();
var container = blobClient.GetContainerReference("MyContainer");
var blob = container.GetBlockBlobReference("my.xml");
var s = blob.DownloadText();
var x = XDocument.Parse(s);
A workaround suggested at http://stackoverflow.com/questions/2111586/parsing-xml-string-to-an-xml-document-fails-if-the-string-begins-with-xml by Dave Cluderay suggests passing the read string through StreamReader. Working code
var storageAccount = CloudStorageAccount.Parse("connectionString");
var blobClient = storageAccount.CreateCloudBlobClient();
var container = blobClient.GetContainerReference("MyContainer");
var blob = container.GetBlockBlobReference("my.xml");
var s = blob.DownloadText();
using (var memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(s)))
{
using (var streamReader = new StreamReader(memoryStream))
{
var x = XDocument.Load(streamReader);
}
}
Moved from https://github.com/Azure/azure-sdk-for-net/issues/626#
Issue Analytics
- State:
- Created 9 years ago
- Comments:7 (2 by maintainers)
Top Results From Across the Web
How to get rid of BOM when downloading text from azure blob
When you download a text content using CloudBlockBlob.DownloadText() if the blob contains the BOM or byte order mark, then the returned text ...
Read more >Adding UTF-8 BOM to string/Blob - javascript
I need to add a UTF-8 byte-order-mark to generated text data on client side. How do I do that? Using new Blob(['\xEF\xBB\xBF' +...
Read more >Bom excel byte order mark
It's working fine, but, after the files have been moved, the UTF-8 BOM is missing in all ... DownloadText() if the blob contains...
Read more >Azure Data Flow utf-8 bom is wrong.
I want to export a csv file with BOM like above. But I get the file without BOM like below. Using hexdump to...
Read more >Why is my UTF-8 document raising UTF-8 encoding errors ...
If the encoding indicator instead showed "UTF-8-BOM", then that would be a guarantee that the file was encoded as UTF-8.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@joefeser 👍 I’m way late to the party, but …
It would have been nice though to have an overload
.DownloadText(bool stripByteOrderMark)
, so that the original method would just do the un-“surprising” thing and the overload would allow the dev to explicitly call for stripping the BOM.I created an extension method
DownloadString
forCloudBlockBlob
and used the same code which is used inWebClient.DownloadString
. Here you can find the code: How to get rid of BOM when downloading text from azure blob