question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

azure-storage-blob : BlobClient creates an unwanted subsirectory inside the container

See original GitHub issue
  • Package Name: azure-storage-blob
  • Package Version: 12.12.0
  • Operating System: Windows
  • Python Version: 3.10.4

Describe the bug When creating a blob inside a container with BlobClient, the blob is created but inside a subfolder having the same name as the container. Note : I use BlobClient directly and the parameter “account_url” is not the one of the whole storage account but the one of the container (with a SAS Token). In fact, I do not really understand that the parameter “container_name” is mandatory if the connexion_string already points to the container_adresse (see additionnal content of this post.

–EDIT– Just to emphase my last sentence about the “container_name” argument to the BlobClient constructor. It has a strange behaviour : if I put a random dummy value there, it will create a subfoler with this name in the right Container (because the right Container is specified in the connexion_string)… (see additionnal content of this post.)

To Reproduce Steps to reproduce the behavior:

def az_blob_storage(connection_string, az_container):
    blob_client = BlobClient(account_url=connection_string, container_name=az_container, blob_name="retest.test" )
    
    # Upload the created file
    with open(Path("test.test"), "rb") as data:
        try:
            blob_client.upload_blob(data=data, overwrite=True)
        except Exception as e:
            print(e)

az_blob_url_asa_token_connection_string = "https://xxxxxx.blob.core.windows.net/testcont?sp=racwdl&st=2022-06-09T14:05:02Z&se=2022-06-09T22:05:02Z&sip=xxxxxxx&spr=https&sv=2021-06-08&sr=c&sig=xxxxxxxx"
az_blob_storage(az_blob_url_asa_token_connection_string, "testcont")

Expected behavior A file “restest.test” is created inside the container, at the root of the container.

What i got

Indeed the retest.test is created but inside a folder having the same name as the container:

  • container: “testcont”
    • subfolder ??? “testcont”
      • the file : “retest.test”

image

Additional context

Most of the examples show the connexion_string as the connexion string of the Storage Instance. I would like to narrow it to the container for security reasons. This is why i use connexion_string to this exact container. In fact, BobClient should not need the container name if its provided in the connexion_string. For exemple, if I write this : blob_client = BlobClient(account_url=connection_string, container_name="blabla" blob_name="retest.test" ) Then a subfoler named “blabla” is created… But still in the right Container (because the Container name is specified in the connexion_string).

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
stockerskycommented, Jun 15, 2022

Actually, I ended up trying this with create_append_blob() - seems to be the actual way of doing it. Works by chunks. Pretty fast and no objects growing in memory.

def _backup_collection_azure(self, collection, connection_string):
        
    container_client = ContainerClient.from_container_url(
        container_url=connection_string, max_single_put_size=64
    )
    blob_client = container_client.get_blob_client(blob=f"{collection}.bson")
    blob_client.create_append_blob()

    # All doc "_ID" in a List
    all_docs_id = [ item.get('_id') for item in self.database[collection].find({})]
    
    CHUNK_SIZE = 500
    # List containing Lists of chunks of CHUNK_SIZE of documents "_id"
    list_chunked = [all_docs_id[i:i + CHUNK_SIZE] for i in range(0, len(all_docs_id), CHUNK_SIZE)]
    
    for chunk in list_chunked:
        with io.BytesIO() as buffer:
            buffer = io.BytesIO()
            for doc in self.database[collection].find({"_id": {"$in": chunk}}):
                buffer.write(bson.BSON.encode(doc))
            buffer.seek(0)
            blob_client.append_block(data=buffer)
        
    print("done")
0reactions
stockerskycommented, Jun 14, 2022

Hi @vincenttran-msft and @jalauzon-msft. Thanks a lot for your precious support.

Yes, this is very interesting. I have a first Proof of Concept that works well : point is to manage backup & restore of CosmosDB database with the MongoDB API.

Here is a little code snippet:

    def _backup_collection_azure(self, collection, connection_string):
        # initialisation du client Azure
        container_client = ContainerClient.from_container_url(
            container_url=connection_string, max_single_put_size=64
        )
        
        blob_client = container_client.get_blob_client(blob=f"{collection}.bson")

        print(f"Backup collection { collection } to stream...", end=" ", flush=True)
        with io.BytesIO() as buf:
            for doc in self.database[collection].find():
                buf.write(bson.BSON.encode(doc))
            buf.seek(0)
            print("done")

            print(f"Upload to Azure Blob Storage...", end=" ", flush=True)
            try:
                blob_client.upload_blob(data=buf, overwrite=True)
            except Exception as e:
                print(e)
            print("done")

As you can see, my use of the BytesIO object is really sub-optimal. Because, in the end, it holds the whole database in memory. Database is small by now, but it’s growing !

what i’d like to achieve is to only hold each MongoDB doc in the BytesIO object and upload it in “append mode” to the blob Object. Well, probably more optimal for memory use, maybe not for Blob Storage, but I’ll find out…

I also tried with “stage_block()” and “commit_block_list()”. From this example

        block_list = []
        print(f"Upload to Azure Blob Storage...", end=" ", flush=True)
        for doc in self.database[collection].find():
            print("one doc")
            try:
                block_id = str(uuid.uuid4())
                blob_client.stage_block(block_id=block_id, data=bson.BSON.encode(doc))
                block_list.append(BlobBlock(block_id))
            except Exception as e:
                print("Erreur écriture sur Blob Storage: STAGING")
                print(e.__class__.__name__)
                print(e)
        try:
            ret = blob_client.commit_block_list(block_list)
            # print(ret)
        except Exception as e:
            print("Erreur écriture sur Blob Storage : COMMIT")
            print(e.__class__.__name__)
            print(e)

Definitly very slow ! And I really don’t see where I would save memory as the list will contain the whole data at the end…

I found this “AppendBlobService” object in Python Azure SDK. Is this the right lib to use for this use case ? I’ll give it a try.

---- EDIT ---- “AppendBlobService” is deprecated and not part of the package azure-storage-blob anymore…

Read more comments on GitHub >

github_iconTop Results From Across the Web

Microsoft Azure: How to create sub directory in a blob container
To add on to what Egon said, simply create your blob called "folder/1.txt", and it will work. No need to create a directory....
Read more >
How to Create a Sub Directory in a Blob Container in Microsoft ...
Luckily, Azure Blob storage offers an easy way to store large files in the cloud. This article will show you how to create...
Read more >
Cheat Sheet: Microsoft Azure Blob Storage - Zuar
Simply navigate to the subscription and storage account then right-click 'Blob Containers' and select 'Create Blob Container' and name it.
Read more >
Creating an Azure Blob Hierarchy | Azure Tips and Tricks
The goal of this exercise is to create a blob hierarchy or folder structure inside of our container. So for example, we'd like...
Read more >
How to create a sub directory in a blob container - Edureka
looping through a containers blobs and checking the type. The code below is in C# CloudBlobContainer container = blobClient.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found