Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

When using hiveCatalog.dropTable(identifier, true), the table directory is not completely removed

See original GitHub issue

When using hiveCatalog.dropTable(identifier, true) to drop a Iceberg table, the table directory is not completely removed. eg. before deleting the table, the data directory of the table is as follows:

⇒  tree   /data/hive/warehouse/test/
/data/hive/warehouse/test/
├── data
│   └── ts_year=2020
│       ├── id_bucket=0
│       │   ├── 00000-0-4718ae1d-ee92-4a39-9c00-6225e791cc68-00001.parquet
│       │   ├── 00000-0-88059c29-5b0d-44da-a7e2-fd886f6ff04a-00001.parquet
│       │   ├── 00001-1-5ea45c12-b7e4-47f3-8fc3-c12849679b9f-00002.parquet
│       │   └── 00001-1-6c8baddb-d0dc-4c49-9d89-75e3c55bac83-00002.parquet
│       └── id_bucket=1
│           ├── 00001-1-5ea45c12-b7e4-47f3-8fc3-c12849679b9f-00001.parquet
│           └── 00001-1-6c8baddb-d0dc-4c49-9d89-75e3c55bac83-00001.parquet
└── metadata
    ├── 00000-aaa7a3d5-bb25-4d07-b28c-1e9b63ef8380.metadata.json
    ├── 00001-77ec6836-5709-44d6-a8aa-405588cc93df.metadata.json
    ├── 00002-21692627-9ba1-47ef-8729-d9cd96533ba5.metadata.json
    ├── 57b07bcc-e3a1-4684-a2b3-26263f2b0535-m0.avro
    ├── bff1d409-7d63-4b33-9d10-d7ebe7efe65c-m0.avro
    ├── snap-2855480055257649189-1-57b07bcc-e3a1-4684-a2b3-26263f2b0535.avro
    └── snap-8307457369176907400-1-bff1d409-7d63-4b33-9d10-d7ebe7efe65c.avro

5 directories, 13 files

after drop table, the data directory of the table is as follows:

⇒  tree   /data/hive/warehouse/test/
/data/hive/warehouse/test/
├── data
│   └── ts_year=2020
│       ├── id_bucket=0
│       └── id_bucket=1
└── metadata
    ├── 00000-aaa7a3d5-bb25-4d07-b28c-1e9b63ef8380.metadata.json
    └── 00001-77ec6836-5709-44d6-a8aa-405588cc93df.metadata.json

5 directories, 2 files

I think the other two meta files should also be deleted, because these files are actually useless, the ts_year=2020/id_bucket=0 and ts_year=2020/id_bucket=1 directories should also need to be deleted.

If drop iceberg table by hadoop catalog or spark.sql("drop table xxx"), all directories associated with the table will be deleted. we should make these behaviors consistent.

Issue Analytics

State:
Created 3 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

rdbluecommented, Dec 1, 2020

@zhangdove, there is no requirement in Iceberg that a table “owns” its location. The intent of not recursively deleting was to avoid dropping other data in the same prefix. I think it would be fine to drop the location recursively in some cases. Maybe that should be a catalog option?

1reaction

397090770commented, Nov 26, 2020

Thanks for your reply, I will submit an RP to make these behaviors consistent.

Top Results From Across the Web

HiveCatalog - Apache Iceberg

Drop a namespace. boolean, dropTable(TableIdentifier identifier, boolean purge). Drop a table; optionally delete data and metadata files.

Solved: Drop table not working as expected in Hive

I run the same sqoop job again, but it not only loads the table with the fresh ... You can use PURGE option...

Hive connector — Trino 403 Documentation

In order to enable first-class support for Avro tables when using Hive 3.x, ... Ignore partitions when the file system location does not...

Using the AWS CLI with Hive metastores - Amazon Athena

The list-table-metadata command is similar to the get-table-metadata command, except that you do not specify a table name. To limit the number of...

[iceberg] 09/18: Hive: Avoid drop table related exceptions in ...

hiveCatalog (conf)) { - LOG.info("Dropping with purge all the data ... folder has been deleted already (Hive 4 behaviour for purge=TRUE) + if ......

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

When using hiveCatalog.dropTable(identifier, true), the table directory is not completely removed

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Hive: create and write iceberg by hive catalog using Spark, Hive client read no data

Add a serializer for FileScanTask