Hoodie clean is not deleting old files
See original GitHub issueI am trying to see if hudi clean is triggering and cleaning my files, but however I do not see any action being performed on cleaning the old log files.
To Reproduce
- I am writing some files to S3 using hudi with below configuration multiple times (4-5 times to see the cleaning triggered.)
My hudi config
`table_name = "demosr"
hudi_options_prodcode = {
'hoodie.table.name': table_name,
'hoodie.datasource.write.recordkey.field': 'key',
'hoodie.datasource.write.partitionpath.field': 'range_partition',
'hoodie.datasource.write.table.name': table_name,
'hoodie.datasource.write.precombine.field': 'update_date',
'hoodie.datasource.write.table.type': 'MERGE_ON_READ',
'hoodie.cleaner.policy': 'KEEP_LATEST_COMMITS',
'hoodie.consistency.check.enabled': True,
'hoodie.bloom.index.filter.type': 'dynamic_v0',
'hoodie.bloom.index.bucketized.checking': False,
'hoodie.memory.merge.max.size': '2004857600000',
'hoodie.upsert.shuffle.parallelism': 500,
'hoodie.insert.shuffle.parallelism': 500,
'hoodie.bulkinsert.shuffle.parallelism': 500,
'hoodie.parquet.small.file.limit': '204857600',
'hoodie.parquet.max.file.size': '484402653184',
'hoodie.memory.compaction.fraction': '384402653184',
'hoodie.write.buffer.limit.bytes': str(128 * 1024 * 1024),
'hoodie.compact.inline': True,
'hoodie.compact.inline.max.delta.commits': 1,
'hoodie.datasource.compaction.async.enable': False,
'hoodie.parquet.compression.ratio': '0.35',
'hoodie.logfile.max.size': '268435456',
'hoodie.logfile.to.parquet.compression.ratio': '0.5',
'hoodie.datasource.write.hive_style_partitioning': True,
'hoodie.keep.min.commits': 2,
'hoodie.keep.max.commits': 3,
'hoodie.copyonwrite.record.size.estimate': 32,
'hoodie.cleaner.commits.retained': 1,
'hoodie.clean.automatic': True
}`
Writing to s3
path_to_delta_table = "s3://testdataprocessing/hudi_clean_test1/" df.write.format("org.apache.hudi").options(**hudi_options_prodcode).mode("append").save(path_to_delta_table)
Expected behavior As per my understanding the logs should be deleted after max commit which is 3 and will keep only one commit at time.
Environment Description
-
Hudi version : 0.6.0
-
Spark version : 2.4
-
Hive version : 2.3.7
-
Hadoop version :
-
Storage (HDFS/S3/GCS…) : S3
-
Running on Docker? (yes/no) : No
-
EMR : 5.31.0
Issue Analytics
- State:
- Created 3 years ago
- Comments:34 (19 by maintainers)
Top Results From Across the Web
[GitHub] [hudi] nsivabalan commented on issue #3739
This is to keep the no of active commits within bounds. (contents of .hoodie folder) all operations within hudi will operation only on ......
Read more >Cleaning | Apache Hudi
Hoodie Cleaner is a utility that helps you reclaim space and keep your ... When cleaning old files, you should be careful not...
Read more >Storage Sense is not deleting old files in the Downloads folder
Downloads folder: The files can be deleted manually. ... sense manually with the "Clean Now" button, but no automatic deletion happens.
Read more >Force Delete a File That Cannot Be Deleted Windows 10
Force Delete a File That Cannot Be Deleted Windows 10▻▻▻SUBSCRIBE for more: https://www.youtube.com/user/Britec09?sub_confirmation=1How ...
Read more >com.uber.hoodie.common.table.timeline.HoodieInstant.getFileName ...
return fs.delete(new Path(metaPath, s.getFileName()), false);... throw new HoodieIOException("Could not delete clean meta files" + s.getFileName(),
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@vinothchandar Yes please! If you can recommend a next step, @mauropelucchi will provide whatever assistance we can.
slightly unrelated comment. guess you might have to fix your config value for hoodie.memory.compaction.fraction. It is expected to be a fraction like 0.3 or 0.5 etc. From your desc, looks like are having some large value for this config.