question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hoodie clean is not deleting old files

See original GitHub issue

I am trying to see if hudi clean is triggering and cleaning my files, but however I do not see any action being performed on cleaning the old log files.

To Reproduce

  1. I am writing some files to S3 using hudi with below configuration multiple times (4-5 times to see the cleaning triggered.)

My hudi config

`table_name = "demosr"
hudi_options_prodcode = {
                'hoodie.table.name': table_name,
                'hoodie.datasource.write.recordkey.field': 'key',
                'hoodie.datasource.write.partitionpath.field': 'range_partition',
                'hoodie.datasource.write.table.name': table_name,
                'hoodie.datasource.write.precombine.field': 'update_date',
                'hoodie.datasource.write.table.type': 'MERGE_ON_READ',
                'hoodie.cleaner.policy': 'KEEP_LATEST_COMMITS',
                'hoodie.consistency.check.enabled': True,
                'hoodie.bloom.index.filter.type': 'dynamic_v0',
                'hoodie.bloom.index.bucketized.checking': False,
                'hoodie.memory.merge.max.size': '2004857600000',
                'hoodie.upsert.shuffle.parallelism': 500,
                'hoodie.insert.shuffle.parallelism': 500,
                'hoodie.bulkinsert.shuffle.parallelism': 500,
                'hoodie.parquet.small.file.limit': '204857600',
                'hoodie.parquet.max.file.size': '484402653184',
                'hoodie.memory.compaction.fraction': '384402653184',
                'hoodie.write.buffer.limit.bytes': str(128 * 1024 * 1024),
                'hoodie.compact.inline': True,
                'hoodie.compact.inline.max.delta.commits': 1,
                'hoodie.datasource.compaction.async.enable': False,
                'hoodie.parquet.compression.ratio': '0.35',
                'hoodie.logfile.max.size': '268435456',
                'hoodie.logfile.to.parquet.compression.ratio': '0.5',
                'hoodie.datasource.write.hive_style_partitioning': True,
                'hoodie.keep.min.commits': 2,
                'hoodie.keep.max.commits': 3,
                'hoodie.copyonwrite.record.size.estimate': 32,
                'hoodie.cleaner.commits.retained': 1,
                 'hoodie.clean.automatic': True
}`

Writing to s3 path_to_delta_table = "s3://testdataprocessing/hudi_clean_test1/" df.write.format("org.apache.hudi").options(**hudi_options_prodcode).mode("append").save(path_to_delta_table)

Expected behavior As per my understanding the logs should be deleted after max commit which is 3 and will keep only one commit at time.

Environment Description

  • Hudi version : 0.6.0

  • Spark version : 2.4

  • Hive version : 2.3.7

  • Hadoop version :

  • Storage (HDFS/S3/GCS…) : S3

  • Running on Docker? (yes/no) : No

  • EMR : 5.31.0

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:34 (19 by maintainers)

github_iconTop GitHub Comments

1reaction
bgt-cdedelscommented, Sep 27, 2021

@vinothchandar Yes please! If you can recommend a next step, @mauropelucchi will provide whatever assistance we can.

1reaction
nsivabalancommented, Feb 18, 2021

slightly unrelated comment. guess you might have to fix your config value for hoodie.memory.compaction.fraction. It is expected to be a fraction like 0.3 or 0.5 etc. From your desc, looks like are having some large value for this config.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[GitHub] [hudi] nsivabalan commented on issue #3739
This is to keep the no of active commits within bounds. (contents of .hoodie folder) all operations within hudi will operation only on ......
Read more >
Cleaning | Apache Hudi
Hoodie Cleaner is a utility that helps you reclaim space and keep your ... When cleaning old files, you should be careful not...
Read more >
Storage Sense is not deleting old files in the Downloads folder
Downloads folder: The files can be deleted manually. ... sense manually with the "Clean Now" button, but no automatic deletion happens.
Read more >
Force Delete a File That Cannot Be Deleted Windows 10
Force Delete a File That Cannot Be Deleted Windows 10▻▻▻SUBSCRIBE for more: https://www.youtube.com/user/Britec09?sub_confirmation=1How ...
Read more >
com.uber.hoodie.common.table.timeline.HoodieInstant.getFileName ...
return fs.delete(new Path(metaPath, s.getFileName()), false);... throw new HoodieIOException("Could not delete clean meta files" + s.getFileName(),
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found