Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RepairManifestsAction

See original GitHub issue

A while ago @aokolnychyi had suggested a new action “RepairManifestsAction”, and I wanted to discuss prior to starting to look at it.

The initial idea is an action that reads DataFiles and uses it to repair Iceberg table manifests. An example usage would be after a metadata bug fix, such as #1980 (file size incorrect in manifest file)

So one proposal is to have a RepairManifestsAction with the typical apis like: actions.repairManifests().filter(Expression expr).caseSensitive(boolean value). //filter out data files that this action will read

And have following options that would make different levels of Spark jobs, in how much of the data-file they read: withRepairFileSize(boolean value). //Reads file sizes from FileSystem, rewrites manifests withRepairRowCount(boolean value) //Opens the file and counts rows, rewrites manifests with the row count withRepairMetrics(boolean value) //Opens the file, and from the file metadata rewrites the manifest with latest metrics withRemoveDeletedFiles(boolean value) //if any datafile referenced by manifest is removed, remove it from manifest

The last one may also be helpful, if a Data File becomes accidentally removed or corrupted , an exception “java.io.FileNotFoundException: No such file or directory” is hit on any query to that Iceberg table that would have hit that data-file. Maybe later having ‘repair manifest-list or repair metadata’ later is also useful to handle cases if manifest files themselves are removed.

This proposal is a bit different than ‘RewriteManifests’ as that just rewrites manifests but does not fix them by reading data-files. The only way to solve an issue like #1980 today would probably be just rewriting all data files.

Issue Analytics

State:
Created 2 years ago
Comments:16 (8 by maintainers)

Top GitHub Comments

1reaction

RussellSpitzercommented, Apr 26, 2021

Ah yeah, I think we usually do new custom Scan’s, See https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/ManifestEntriesTable.java#L96-L140 For example. I would hesitate to touch anything around “planFiles” for this and would instead op for reading the manifest files directly

0reactions

chenwyi2commented, Nov 8, 2022

@KarlManong how you solved this prpblem ? the only way is putting a metadata file into hdfs?