question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

STORAGE: Bucket.delete_blobs() should use Batch

See original GitHub issue

This will also allow Bucket.delete(force=True) to use Batch()

Issue Analytics

  • State:closed
  • Created 9 years ago
  • Comments:17 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
The-Fonzcommented, Oct 16, 2019

Came here while trying to open a new issue concerning this. My usecase is deleting or moving millions of blobs. Currently I use one main thread that lists all blobs in a bucket and puts the ones to be deleted on a queue. A few dozen worker threads then individually delete each blob. I’m averaging about 60 deletes/sec, adding more threads doesn’t help. And that’s a bit slow for millions of blobs (takes days to run).

I’m trying to use the Batch class for this, but it is not entirely clear to me how to correctly use it and if it even supports deletes, am getting a header parsing error when I try to batch deletes.

It would be great if the bucket.delete_blobs() method (and other related methods for that matter) would use a batch by default. The current code is:

    def delete_blobs(self, blobs, on_error=None, client=None):
        """Deletes a list of blobs from the current bucket.

        Uses :meth:`delete_blob` to delete each individual blob.

        If :attr:`user_project` is set, bills the API request to that project.

        :type blobs: list
        :param blobs: A list of :class:`~google.cloud.storage.blob.Blob`-s or
                      blob names to delete.

        :type on_error: callable
        :param on_error: (Optional) Takes single argument: ``blob``. Called
                         called once for each blob raising
                         :class:`~google.cloud.exceptions.NotFound`;
                         otherwise, the exception is propagated.

        :type client: :class:`~google.cloud.storage.client.Client`
        :param client: (Optional) The client to use.  If not passed, falls back
                       to the ``client`` stored on the current bucket.

        :raises: :class:`~google.cloud.exceptions.NotFound` (if
                 `on_error` is not passed).
        """
        for blob in blobs:
            try:
                blob_name = blob
                if not isinstance(blob_name, six.string_types):
                    blob_name = blob.name
                self.delete_blob(blob_name, client=client)
            except NotFound:
                if on_error is not None:
                    on_error(blob)
                else:
                    raise

The batch version might look something like (just a draft, not tested):

def delete_blobs(self, blobs, on_error=None, client=None):
    with self.client.batch() as batch:
        for blob in blobs:
            try:
                blob_name = blob
                if not isinstance(blob_name, six.string_types):
                    blob_name = blob.name
                self.delete_blob(blob_name, client=client)
            # Send batch off if full
            if len(batch._requests) > batch._MAX_BATCH_SIZE - 1:
                batch.finish()
0reactions
lukesneeringercommented, Aug 11, 2017

Hello, One of the challenges of maintaining a large open source project is that sometimes, you can bite off more than you can chew. As the lead maintainer of google-cloud-python, I can definitely say that I have let the issues here pile up.

As part of trying to get things under control (as well as to empower us to provide better customer service in the future), I am declaring a “bankruptcy” of sorts on many of the old issues, especially those likely to have been addressed or made obsolete by more recent updates.

My goal is to close stale issues whose relevance or solution is no longer immediately evident, and which appear to be of lower importance. I believe in good faith that this is one of those issues, but I am scanning quickly and may occasionally be wrong. If this is an issue of high importance, please comment here and we will reconsider. If this is an issue whose solution is trivial, please consider providing a pull request.

Thank you!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Deleting multiple blobs from Google Cloud Storage efficiently
We can simply leverage com.google.cloud.storage.StorageBatch to efficiently delete multiple blobs in a bucket. public static rmAll(Storage ...
Read more >
Delete objects | Cloud Storage - Google Cloud
In the list of buckets, click on the name of the bucket that contains the objects you want to delete. The Bucket details...
Read more >
com.google.cloud.storage.StorageBatch.delete java code ...
Adds a request representing the "delete blob" operation to this batch. Calling StorageBatchResult#get() on the return value yields true upon successful deletion ...
Read more >
BlobBatch.DeleteBlob Method (Azure.Storage.Blobs ...
The blob is later deleted during garbage collection which could take several minutes. Note that in order to delete a blob, you must...
Read more >
Source code for google.cloud.storage.bucket
This is used in Bucket.delete() and Bucket.make_public(). ... This will return None if the blob doesn't exist:: >>> from google.cloud import storage ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found