Delete files not eventually removed if RewriteDataFile run right after delete (when using 'use-starting-sequence-number' default)
See original GitHub issueThis RewriteDataFile flag value ‘use-starting-sequence-number’ (added and default true in #3480) seems to prevent delete files from getting cleaned up if these operations are run:
Delete from my_table where =
=> new delete_file has sequence number = n
CALL %s.system.rewrite_data_files(table => my_table, options => map ('delete-file-threshold','1')
=> new data files have sequence_number = n, because starting sequence number = n
The only cleanup mechanism for delete files today is only when they have a sequence number less than all existing data files. So these delete files are not cleaned up in the subsequent operation (unlike if the flag was off and the rewritten files get the next sequence number n+1). As these data-files were just successfully re-written, it’s doubtful their delete file can ever get cleaned up because further rewrites will probably skip these already-optimized data files. These delete files will stay until all these data files get deleted.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:3
- Comments:5 (1 by maintainers)
Top GitHub Comments
Note , plan to put a design doc up next week for this.
Looks like more people hit this issue.
@rdblue @aokolnychyi @RussellSpitzer @jackye1995 (or anyone else) fyi if any thoughts on this. Its probably not a huge issue as delete files will not apply, and we plan eventually to have removeDanglingDeleteFiles, but not sure if we need to document this for time being or any other quick fix.