question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItΒ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

When using hiveCatalog.dropTable(identifier, true), the table directory is not completely removed

See original GitHub issue

When using hiveCatalog.dropTable(identifier, true) to drop a Iceberg table, the table directory is not completely removed. eg. before deleting the table, the data directory of the table is as follows:

β‡’  tree   /data/hive/warehouse/test/
/data/hive/warehouse/test/
β”œβ”€β”€ data
β”‚Β Β  └── ts_year=2020
β”‚Β Β      β”œβ”€β”€ id_bucket=0
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ 00000-0-4718ae1d-ee92-4a39-9c00-6225e791cc68-00001.parquet
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ 00000-0-88059c29-5b0d-44da-a7e2-fd886f6ff04a-00001.parquet
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ 00001-1-5ea45c12-b7e4-47f3-8fc3-c12849679b9f-00002.parquet
β”‚Β Β      β”‚Β Β  └── 00001-1-6c8baddb-d0dc-4c49-9d89-75e3c55bac83-00002.parquet
β”‚Β Β      └── id_bucket=1
β”‚Β Β          β”œβ”€β”€ 00001-1-5ea45c12-b7e4-47f3-8fc3-c12849679b9f-00001.parquet
β”‚Β Β          └── 00001-1-6c8baddb-d0dc-4c49-9d89-75e3c55bac83-00001.parquet
└── metadata
    β”œβ”€β”€ 00000-aaa7a3d5-bb25-4d07-b28c-1e9b63ef8380.metadata.json
    β”œβ”€β”€ 00001-77ec6836-5709-44d6-a8aa-405588cc93df.metadata.json
    β”œβ”€β”€ 00002-21692627-9ba1-47ef-8729-d9cd96533ba5.metadata.json
    β”œβ”€β”€ 57b07bcc-e3a1-4684-a2b3-26263f2b0535-m0.avro
    β”œβ”€β”€ bff1d409-7d63-4b33-9d10-d7ebe7efe65c-m0.avro
    β”œβ”€β”€ snap-2855480055257649189-1-57b07bcc-e3a1-4684-a2b3-26263f2b0535.avro
    └── snap-8307457369176907400-1-bff1d409-7d63-4b33-9d10-d7ebe7efe65c.avro

5 directories, 13 files

after drop table, the data directory of the table is as follows:

β‡’  tree   /data/hive/warehouse/test/
/data/hive/warehouse/test/
β”œβ”€β”€ data
β”‚Β Β  └── ts_year=2020
β”‚Β Β      β”œβ”€β”€ id_bucket=0
β”‚Β Β      └── id_bucket=1
└── metadata
    β”œβ”€β”€ 00000-aaa7a3d5-bb25-4d07-b28c-1e9b63ef8380.metadata.json
    └── 00001-77ec6836-5709-44d6-a8aa-405588cc93df.metadata.json

5 directories, 2 files

I think the other two meta files should also be deleted, because these files are actually useless, the ts_year=2020/id_bucket=0 and ts_year=2020/id_bucket=1 directories should also need to be deleted.

If drop iceberg table by hadoop catalog or spark.sql("drop table xxx"), all directories associated with the table will be deleted. we should make these behaviors consistent.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
rdbluecommented, Dec 1, 2020

@zhangdove, there is no requirement in Iceberg that a table β€œowns” its location. The intent of not recursively deleting was to avoid dropping other data in the same prefix. I think it would be fine to drop the location recursively in some cases. Maybe that should be a catalog option?

1reaction
397090770commented, Nov 26, 2020

Thanks for your reply, I will submit an RP to make these behaviors consistent.

Read more comments on GitHub >

github_iconTop Results From Across the Web

HiveCatalog - Apache Iceberg
Drop a namespace. boolean, dropTable​(TableIdentifier identifier, boolean purge). Drop a table; optionally delete data and metadata files.
Read more >
Solved: Drop table not working as expected in Hive
I run the same sqoop job again, but it not only loads the table with the fresh ... You can use PURGE option...
Read more >
Hive connector β€” Trino 403 Documentation
In order to enable first-class support for Avro tables when using Hive 3.x, ... Ignore partitions when the file system location does not...
Read more >
Using the AWS CLI with Hive metastores - Amazon Athena
The list-table-metadata command is similar to the get-table-metadata command, except that you do not specify a table name. To limit the number of...
Read more >
[iceberg] 09/18: Hive: Avoid drop table related exceptions in ...
hiveCatalog (conf)) { - LOG.info("Dropping with purge all the data ... folder has been deleted already (Hive 4 behaviour for purge=TRUE) + ifΒ ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found