Multiple prefixes in GoogleCloudStorageListOperator and GoogleCloudStorageDeleteOperator
See original GitHub issueDescription
Support passing multiple prefixes to GoogleCloudStorageListOperator
and GoogleCloudStorageDeleteOperator
operators.
Use case / motivation
I have this folder structure in GCS bucket.
+-- year={year}
| +-- month={month}
| +--day={day}
| +-- topic={topic1}
| +--file 1
| +--file 2
| +--file 3
| +-- topic={topic2}
| +--file 1
| +--file 2
| +--file 3
| +-- topic={topic3}
| +--file 1
| +--file 2
| +--file 3
| +-- topic={topic4}
| +--file 1
| +--file 2
| +--file 3
| +-- topic={topic5}
| +--file 1
| +--file 2
| +--file 3
| +-- topic={topic6}
| +--file 1
| +--file 2
| +--file 3
| +-- topic={topic7}
| +--day={day}
| +-- topic={topic1}
| +--file 1
| +--file 2
| +--file 3
| +-- topic={topic2}
| +--file 1
| +--file 2
| +--file 3
| +-- topic={topic3}
| +--file 1
| +--file 2
| +--file 3
| +-- topic={topic4}
| +--file 1
| +--file 2
| +--file 3
| +-- topic={topic5}
| +--file 1
| +--file 2
| +--file 3
| +-- topic={topic6}
| +--file 1
| +--file 2
| +--file 3
| +-- topic={topic7}
| +--file 1
| +--file 2
| +--file 3
| ....
What I need to achieve is delete one day of objects. For example, I need to delete objects in year=2020/month=08/day=19
. I can do that easily using gsutils
. In gsutil
you can delete them via wild card gsutil ear=2020/month=08/day=19/*
but using the REST APIs you can’t even if you use a prefix. The reason is there is no one prefix to get all the objects inside a folder. I achieved that by using multiple prefixes and for each prefix, I will get the list of objects. Unfortunately, I can’t pass more than one prefix to the operators.
Prefixes used
year=2020/month=08/day=19/topic={topic1}
year=2020/month=08/day=19/topic={topic2}
year=2020/month=08/day=19/topic={topic3}
year=2020/month=08/day=19/topic={topic4}
year=2020/month=08/day=19/topic={topic5}
year=2020/month=08/day=19/topic={topic6}
year=2020/month=08/day=19/topic={topic7}
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (5 by maintainers)
Top Results From Across the Web
[GitHub] [airflow] EmadMokhtar opened a new issue #10426
**Description** Support passing multiple prefixes to `GoogleCloudStorageListOperator` and `GoogleCloudStorageDeleteOperator` operators.
Read more >List the objects in a bucket using a prefix filter | Cloud Storage
Prefixes and delimiters can be used to emulate directory listings. /// Prefixes can be used to filter objects starting with prefix.
Read more >airflow.contrib.operators.gcs_delete_operator
Module Contents¶. class airflow.contrib.operators.gcs_delete_operator. GoogleCloudStorageDeleteOperator (bucket_name, objects=None, prefix=None, ...
Read more >Release Notes - Apache Airflow documentation - Amazon AWS
Fix RecursionError on graph view of a DAG with many tasks (#26175) ... Add group prefix to decorated mapped task (#26081). Fix UI...
Read more >Delete all files in 'folder' or with prefix in Google Cloud Bucket ...
The API only supports deleting a single object at a time. You can only request many deletions using many HTTP requests or by...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
prefix
is a parameter oflist_blobs
https://googleapis.dev/python/storage/latest/client.html even if you modify the parameter on the hook at the end you will still be able to utalize only single prefix each time. You can modify prefix to acceptOptional[str,List[str]]
that way the modification is also backward compatible. This has some similarities to approach suggested on https://github.com/apache/airflow/issues/15001I want to but I’m facing issues with setup the dev environment for Airflow. I will give it another try an upcoming week.