question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Wrong encoding when reading binary data from Azure blob storage

See original GitHub issue

It is not possible to correctly read binary data from bound Azure Blob Storage in Python Azure Function when debugging locally. This issue is also described here: https://github.com/Azure/azure-functions-host/issues/4374. Looks like the problem is related to incorrect utf-8 decoding of binary data.

Investigative information

Please provide the following:
  • Core Tools version: 3.0.2630, 2.7.2628 (tested in both)

Repro steps

Provide the steps required to reproduce the problem:
  1. Bound Azure Function to input blob parameter b_in
  2. Specify the parameter to the function as b_in: func.InputStream
  3. Get binary data as binary = b_in.read()

Expected behavior

Provide a description of the expected behavior.

Binary data should directly correspond to the data in the blob storage

Actual behavior

Provide a description of the actual behavior observed.

Binary data is wrong. In my case, first bytes in the binary file were 80 04 95 78 19 01 00 00 … (in hex), and bytes obtained in binary variable are EF BF BD 04 EF BF BD 19 01 00 00 … - note that they start with unicode replacement character (which means that most probably UTF-8 decoding took place).

Known workarounds

Provide a description of any known workarounds.

One can store the data in base64 encoding, but that would unnecessarily complicate the code.

Contents of the requirements.txt file:

Provide the requirements.txt file to help us find out module related issues.

azure-functions pandas>=1.0.3 matplotlib scipy numpy

Related information

Provide any related information
Code

function.json

{
  "scriptFile": "__init__.py",
  "bindings": [
    {
      "authLevel": "anonymous",
      "type": "httpTrigger",
      "direction": "in",
      "name": "req",
      "methods": [
        "get",
        "post"
      ]
    },
    {
      "type": "http",
      "direction": "out",
      "name": "$return"
    },
    {
      "type": "blob",
      "direction": "in",
      "name": "covblob",
      "path": "snapshot/current",
      "connection": "e2estore_STORAGE"
    }
  ]
}

__init__.py

import logging
import pandas as pd
import pickle
import azure.functions as func


def main(req: func.HttpRequest, covblob: func.InputStream) -> func.HttpResponse:

    country = req.params.get('country') or 'Russia'

    logging.info('covidata function triggered with country={} and blob with len={}'.format(country,covblob.length))

    binary = covblob.read()
    logging.info("binary is {}, len={}".format(binary[:15],len(binary)))

    data = pickle.loads(binary)
    
    df = data[country]

    res = df.to_csv()

    return func.HttpResponse(res,status=200)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
shwarscommented, Aug 28, 2020

@stefanushinardi @vrdmr - if you were able to get the tuple, it means that you have not experienced the error. I was getting an error when doing pickle.loads. I have never tried it with local storage emulator, I was using real azure storage. I will try to set up a sample azure function in the cloud and share with you.

0reactions
shwarscommented, Sep 16, 2020

Might be an opportunity to add some docs on the data type property or increase visibility if it already exists. @ggailey777

I will propose some additions to this doc (which was the first one I came across when looking for details), once we have a bit more clarity on another issue with Python parameter typing. @stefanushinardi also suggested this doc where dataType is described, but it was not easily discoverable for me.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Problem with azure blob storage encoding when uploading a file
Hi! I'm uploading files to Azure Blob Storage with the . Net package specifying the encoding iso-8859-1. The stream seems ok in Memory...
Read more >
azure - ANSI encoded text file in blob storage has corrupted ...
The problem occurs when I use a logic app to iterate through the blobs, then getting the file contents and placing them in...
Read more >
Why is my UTF-8 document raising UTF-8 encoding errors in ...
An encoding error is when a byte sequence can't produce a character in that encoding. Meaning: there is something missing from the ée...
Read more >
How can I fix the UTF-8 error when bulk uploading users?
This error is created when the uploaded file is not in a UTF-8 format. UTF-8 is the dominant character encoding format on the...
Read more >
Blob from my shows - Vesuvio
BLOB is the family of column type intended as high-capacity binary storage. AzureBlobStorage Writes to a file in Windows Azure Blob Storage.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found