[BUG] Wrong encoding when reading binary data from Azure blob storage
See original GitHub issueIt is not possible to correctly read binary data from bound Azure Blob Storage in Python Azure Function when debugging locally. This issue is also described here: https://github.com/Azure/azure-functions-host/issues/4374. Looks like the problem is related to incorrect utf-8 decoding of binary data.
Investigative information
Please provide the following:
- Core Tools version: 3.0.2630, 2.7.2628 (tested in both)
Repro steps
Provide the steps required to reproduce the problem:
- Bound Azure Function to input blob parameter b_in
- Specify the parameter to the function as
b_in: func.InputStream - Get binary data as
binary = b_in.read()
Expected behavior
Provide a description of the expected behavior.
Binary data should directly correspond to the data in the blob storage
Actual behavior
Provide a description of the actual behavior observed.
Binary data is wrong. In my case, first bytes in the binary file were 80 04 95 78 19 01 00 00 … (in hex), and bytes obtained in binary variable are EF BF BD 04 EF BF BD 19 01 00 00 … - note that they start with unicode replacement character (which means that most probably UTF-8 decoding took place).
Known workarounds
Provide a description of any known workarounds.
One can store the data in base64 encoding, but that would unnecessarily complicate the code.
Contents of the requirements.txt file:
Provide the requirements.txt file to help us find out module related issues.
azure-functions pandas>=1.0.3 matplotlib scipy numpy
Related information
Provide any related information
Code
function.json
{
"scriptFile": "__init__.py",
"bindings": [
{
"authLevel": "anonymous",
"type": "httpTrigger",
"direction": "in",
"name": "req",
"methods": [
"get",
"post"
]
},
{
"type": "http",
"direction": "out",
"name": "$return"
},
{
"type": "blob",
"direction": "in",
"name": "covblob",
"path": "snapshot/current",
"connection": "e2estore_STORAGE"
}
]
}
__init__.py
import logging
import pandas as pd
import pickle
import azure.functions as func
def main(req: func.HttpRequest, covblob: func.InputStream) -> func.HttpResponse:
country = req.params.get('country') or 'Russia'
logging.info('covidata function triggered with country={} and blob with len={}'.format(country,covblob.length))
binary = covblob.read()
logging.info("binary is {}, len={}".format(binary[:15],len(binary)))
data = pickle.loads(binary)
df = data[country]
res = df.to_csv()
return func.HttpResponse(res,status=200)
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (1 by maintainers)

Top Related StackOverflow Question
@stefanushinardi @vrdmr - if you were able to get the tuple, it means that you have not experienced the error. I was getting an error when doing pickle.loads. I have never tried it with local storage emulator, I was using real azure storage. I will try to set up a sample azure function in the cloud and share with you.
I will propose some additions to this doc (which was the first one I came across when looking for details), once we have a bit more clarity on another issue with Python parameter typing. @stefanushinardi also suggested this doc where
dataTypeis described, but it was not easily discoverable for me.