Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

azure-storage-blob : BlobClient creates an unwanted subsirectory inside the container

See original GitHub issue

Package Name: azure-storage-blob
Package Version: 12.12.0
Operating System: Windows
Python Version: 3.10.4

Describe the bug When creating a blob inside a container with BlobClient, the blob is created but inside a subfolder having the same name as the container. Note : I use BlobClient directly and the parameter “account_url” is not the one of the whole storage account but the one of the container (with a SAS Token). In fact, I do not really understand that the parameter “container_name” is mandatory if the connexion_string already points to the container_adresse (see additionnal content of this post.

–EDIT– Just to emphase my last sentence about the “container_name” argument to the BlobClient constructor. It has a strange behaviour : if I put a random dummy value there, it will create a subfoler with this name in the right Container (because the right Container is specified in the connexion_string)… (see additionnal content of this post.)

To Reproduce Steps to reproduce the behavior:

def az_blob_storage(connection_string, az_container):
    blob_client = BlobClient(account_url=connection_string, container_name=az_container, blob_name="retest.test" )
    
    # Upload the created file
    with open(Path("test.test"), "rb") as data:
        try:
            blob_client.upload_blob(data=data, overwrite=True)
        except Exception as e:
            print(e)

az_blob_url_asa_token_connection_string = "https://xxxxxx.blob.core.windows.net/testcont?sp=racwdl&st=2022-06-09T14:05:02Z&se=2022-06-09T22:05:02Z&sip=xxxxxxx&spr=https&sv=2021-06-08&sr=c&sig=xxxxxxxx"
az_blob_storage(az_blob_url_asa_token_connection_string, "testcont")

Expected behavior A file “restest.test” is created inside the container, at the root of the container.

What i got

Indeed the retest.test is created but inside a folder having the same name as the container:

container: “testcont”
- subfolder ??? “testcont”
  - the file : “retest.test”

Additional context

Most of the examples show the connexion_string as the connexion string of the Storage Instance. I would like to narrow it to the container for security reasons. This is why i use connexion_string to this exact container. In fact, BobClient should not need the container name if its provided in the connexion_string. For exemple, if I write this : blob_client = BlobClient(account_url=connection_string, container_name="blabla" blob_name="retest.test" ) Then a subfoler named “blabla” is created… But still in the right Container (because the Container name is specified in the connexion_string).

Issue Analytics

State:
Created a year ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

stockerskycommented, Jun 15, 2022

Actually, I ended up trying this with create_append_blob() - seems to be the actual way of doing it. Works by chunks. Pretty fast and no objects growing in memory.

def _backup_collection_azure(self, collection, connection_string):
        
    container_client = ContainerClient.from_container_url(
        container_url=connection_string, max_single_put_size=64
    )
    blob_client = container_client.get_blob_client(blob=f"{collection}.bson")
    blob_client.create_append_blob()

    # All doc "_ID" in a List
    all_docs_id = [ item.get('_id') for item in self.database[collection].find({})]
    
    CHUNK_SIZE = 500
    # List containing Lists of chunks of CHUNK_SIZE of documents "_id"
    list_chunked = [all_docs_id[i:i + CHUNK_SIZE] for i in range(0, len(all_docs_id), CHUNK_SIZE)]
    
    for chunk in list_chunked:
        with io.BytesIO() as buffer:
            buffer = io.BytesIO()
            for doc in self.database[collection].find({"_id": {"$in": chunk}}):
                buffer.write(bson.BSON.encode(doc))
            buffer.seek(0)
            blob_client.append_block(data=buffer)
        
    print("done")

0reactions

stockerskycommented, Jun 14, 2022

Hi @vincenttran-msft and @jalauzon-msft. Thanks a lot for your precious support.

Yes, this is very interesting. I have a first Proof of Concept that works well : point is to manage backup & restore of CosmosDB database with the MongoDB API.

Here is a little code snippet:

    def _backup_collection_azure(self, collection, connection_string):
        # initialisation du client Azure
        container_client = ContainerClient.from_container_url(
            container_url=connection_string, max_single_put_size=64
        )
        
        blob_client = container_client.get_blob_client(blob=f"{collection}.bson")

        print(f"Backup collection { collection } to stream...", end=" ", flush=True)
        with io.BytesIO() as buf:
            for doc in self.database[collection].find():
                buf.write(bson.BSON.encode(doc))
            buf.seek(0)
            print("done")

            print(f"Upload to Azure Blob Storage...", end=" ", flush=True)
            try:
                blob_client.upload_blob(data=buf, overwrite=True)
            except Exception as e:
                print(e)
            print("done")

As you can see, my use of the BytesIO object is really sub-optimal. Because, in the end, it holds the whole database in memory. Database is small by now, but it’s growing !

what i’d like to achieve is to only hold each MongoDB doc in the BytesIO object and upload it in “append mode” to the blob Object. Well, probably more optimal for memory use, maybe not for Blob Storage, but I’ll find out…

I also tried with “stage_block()” and “commit_block_list()”. From this example

        block_list = []
        print(f"Upload to Azure Blob Storage...", end=" ", flush=True)
        for doc in self.database[collection].find():
            print("one doc")
            try:
                block_id = str(uuid.uuid4())
                blob_client.stage_block(block_id=block_id, data=bson.BSON.encode(doc))
                block_list.append(BlobBlock(block_id))
            except Exception as e:
                print("Erreur écriture sur Blob Storage: STAGING")
                print(e.__class__.__name__)
                print(e)
        try:
            ret = blob_client.commit_block_list(block_list)
            # print(ret)
        except Exception as e:
            print("Erreur écriture sur Blob Storage : COMMIT")
            print(e.__class__.__name__)
            print(e)

Definitly very slow ! And I really don’t see where I would save memory as the list will contain the whole data at the end…

I found this “AppendBlobService” object in Python Azure SDK. Is this the right lib to use for this use case ? I’ll give it a try.

---- EDIT ---- “AppendBlobService” is deprecated and not part of the package azure-storage-blob anymore…

Top Results From Across the Web

Microsoft Azure: How to create sub directory in a blob container

To add on to what Egon said, simply create your blob called "folder/1.txt", and it will work. No need to create a directory....

How to Create a Sub Directory in a Blob Container in Microsoft ...

Luckily, Azure Blob storage offers an easy way to store large files in the cloud. This article will show you how to create...

Cheat Sheet: Microsoft Azure Blob Storage - Zuar

Simply navigate to the subscription and storage account then right-click 'Blob Containers' and select 'Create Blob Container' and name it.

Creating an Azure Blob Hierarchy | Azure Tips and Tricks

The goal of this exercise is to create a blob hierarchy or folder structure inside of our container. So for example, we'd like...

How to create a sub directory in a blob container - Edureka

looping through a containers blobs and checking the type. The code below is in C# CloudBlobContainer container = blobClient.

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

azure-storage-blob : BlobClient creates an unwanted subsirectory inside the container

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

GraphRbacManagementClient thows signed_session error when trying to lookup service principals

ImportError: cannot import name 'SerializationError' from 'azure.core.exceptions'