When using hiveCatalog.dropTable(identifier, true), the table directory is not completely removed
See original GitHub issueWhen using hiveCatalog.dropTable(identifier, true) to drop a Iceberg table, the table directory is not completely removed. eg. before deleting the table, the data directory of the table is as follows:
β tree /data/hive/warehouse/test/
/data/hive/warehouse/test/
βββ data
βΒ Β βββ ts_year=2020
βΒ Β βββ id_bucket=0
βΒ Β βΒ Β βββ 00000-0-4718ae1d-ee92-4a39-9c00-6225e791cc68-00001.parquet
βΒ Β βΒ Β βββ 00000-0-88059c29-5b0d-44da-a7e2-fd886f6ff04a-00001.parquet
βΒ Β βΒ Β βββ 00001-1-5ea45c12-b7e4-47f3-8fc3-c12849679b9f-00002.parquet
βΒ Β βΒ Β βββ 00001-1-6c8baddb-d0dc-4c49-9d89-75e3c55bac83-00002.parquet
βΒ Β βββ id_bucket=1
βΒ Β βββ 00001-1-5ea45c12-b7e4-47f3-8fc3-c12849679b9f-00001.parquet
βΒ Β βββ 00001-1-6c8baddb-d0dc-4c49-9d89-75e3c55bac83-00001.parquet
βββ metadata
βββ 00000-aaa7a3d5-bb25-4d07-b28c-1e9b63ef8380.metadata.json
βββ 00001-77ec6836-5709-44d6-a8aa-405588cc93df.metadata.json
βββ 00002-21692627-9ba1-47ef-8729-d9cd96533ba5.metadata.json
βββ 57b07bcc-e3a1-4684-a2b3-26263f2b0535-m0.avro
βββ bff1d409-7d63-4b33-9d10-d7ebe7efe65c-m0.avro
βββ snap-2855480055257649189-1-57b07bcc-e3a1-4684-a2b3-26263f2b0535.avro
βββ snap-8307457369176907400-1-bff1d409-7d63-4b33-9d10-d7ebe7efe65c.avro
5 directories, 13 files
after drop table, the data directory of the table is as follows:
β tree /data/hive/warehouse/test/
/data/hive/warehouse/test/
βββ data
βΒ Β βββ ts_year=2020
βΒ Β βββ id_bucket=0
βΒ Β βββ id_bucket=1
βββ metadata
βββ 00000-aaa7a3d5-bb25-4d07-b28c-1e9b63ef8380.metadata.json
βββ 00001-77ec6836-5709-44d6-a8aa-405588cc93df.metadata.json
5 directories, 2 files
I think the other two meta files should also be deleted, because these files are actually useless, the ts_year=2020/id_bucket=0
and ts_year=2020/id_bucket=1
directories should also need to be deleted.
If drop iceberg table by hadoop catalog or spark.sql("drop table xxx")
, all directories associated with the table will be deleted. we should make these behaviors consistent.
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
HiveCatalog - Apache Iceberg
Drop a namespace. boolean, dropTableβ(TableIdentifier identifier, boolean purge). Drop a table; optionally delete data and metadata files.
Read more >Solved: Drop table not working as expected in Hive
I run the same sqoop job again, but it not only loads the table with the fresh ... You can use PURGE option...
Read more >Hive connector β Trino 403 Documentation
In order to enable first-class support for Avro tables when using Hive 3.x, ... Ignore partitions when the file system location does not...
Read more >Using the AWS CLI with Hive metastores - Amazon Athena
The list-table-metadata command is similar to the get-table-metadata command, except that you do not specify a table name. To limit the number of...
Read more >[iceberg] 09/18: Hive: Avoid drop table related exceptions in ...
hiveCatalog (conf)) { - LOG.info("Dropping with purge all the data ... folder has been deleted already (Hive 4 behaviour for purge=TRUE) + ifΒ ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@zhangdove, there is no requirement in Iceberg that a table βownsβ its location. The intent of not recursively deleting was to avoid dropping other data in the same prefix. I think it would be fine to drop the location recursively in some cases. Maybe that should be a catalog option?
Thanks for your reply, I will submit an RP to make these behaviors consistent.