question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add Delete Files To Files metadata table

See original GitHub issue

I am thinking current “files” table suffices to add “delete file” information.

Today, “entries” table already shows some delete file information, though in struct form. Example entries table “row” for positional delete file:

|status|snapshot_id        |sequence_number|data_file
|1     |3491280865879215816|2              |{1, file:/my_table_path/data/p=1/00129-2-47a2d786-6eaa-45c8-a7df-6bf1303553ec-00001.parquet, PARQUET, 0, {1}, 1, 1798, {2147483546 -> 186, 2147483545 -> 46}, {2147483546 -> 1, 2147483545 -> 1}, {2147483546 -> 0, 2147483545 -> 0}, {}, {2147483546 -> file:/my_table_path/data/p=1/00000-0-2043c59b-7a64-4c51-9a99-fb95f83ac076-00001.parquet, 2147483545 -> }, {2147483546 -> file:/my_table/data/p=1/00000-0-2043c59b-7a64-4c51-9a99-fb95f83ac076-00001.parquet, 2147483545 -> }, null, null, null, null}|

I think the current “files” table has an adequate schema to capture delete files metadata.

|content|   file_path|file_format|spec_id|partition|record_count|file_size_in_bytes|      column_sizes|    value_counts|null_value_counts|nan_value_counts|        lower_bounds|        upper_bounds|key_metadata|split_offsets|equality_ids|sort_order_id|

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:14 (9 by maintainers)

github_iconTop GitHub Comments

4reactions
aokolnychyicommented, Mar 4, 2022

The current plan:

  • Add DELETE_FILES
  • Add ALL_DELETE_FILES
  • Make FILES report both data and delete files
  • Add DATA_FILES
3reactions
szehon-hocommented, Mar 4, 2022

Makes sense to me, thanks guys. (or welcome to more opinions).

For completeness, I think we could also have ALL_FILES at some point (for user to conveniently see total reachability graph of Iceberg)

Read more comments on GitHub >

github_iconTop Results From Across the Web

About editing and deleting metadata values - FileHold
Use advanced search to find all the documents with the old metadata value, select them all, then open the metadata pane for editing....
Read more >
Iceberg Table Spec
Version 2 of the Iceberg spec adds row-level updates and deletes for analytic tables with immutable files. The primary change in version 2...
Read more >
Adding and Removing Access Database Files (AccessToSQL)
In Access Metadata Explorer, right-click the database, and then select Refresh from Database.
Read more >
Step 8. Remove the Successfully Loaded Data Files
Execute REMOVE to remove any data files from the internal stage that you loaded successfully. In regular use, removing data files you've loaded...
Read more >
How to Remove Metadata from Microsoft Word Files - Webucator
When sharing Word documents, you may wish to remove document metadata, which is hidden or personal data.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found