question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] How to delete / create Hudi Metadata Tables in AWS Glue?

See original GitHub issue

I use Apache Hudi ( v0.9 ) in AWS Glue.

Since I enabled Hudi Metadata Table, I started seeing FileNotFoundException like below.

  • I enabled “hoodie.metadata.enable=True” from the very beginning.
  • I use Zoopkeeper as Lock Provider.

I guess the “FileNotFoundException” happened because the content of Hudi Metadata is out-dated. Since I use AWS Glue, I don’t have a way to run Hudi Metadata CLI.

Is there a way to rebuild the Hudi Metadata again by running a Glue ( or Spark Job ) ?

Thank you Gatsby

Caused by: java.io.FileNotFoundException: No such file or directory 's3://staging/events_v0/org_id=89/06a89e17-296b-4cf4-932f-684a95524090-0_22-8747-78425_20220204032706.parquet'
	at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:532)
	at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:694)
	at org.apache.parquet.hadoop.util.HadoopInputFile.fromPath(HadoopInputFile.java:61)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:456)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:441)
	at org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:176)

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
nsivabalancommented, Feb 7, 2022

btw, FileNotFound could happen w/ regular data table itself (even if you have disabled metadata), on some valid cases. for eg, if your cleaner is aggressive and cleans up the data files which your long running query is still running, it could lead to FileNotFoundIssue.

1reaction
nsivabalancommented, Feb 7, 2022

it should not be out of sync at all. if there is, we should chase and triage the bug.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Writing to Apache Hudi tables using AWS Glue Custom ...
To create your AWS Glue job with an AWS Glue Custom Connector, complete the following steps: Go to the AWS Glue Studio Console,...
Read more >
Working with tables on the AWS Glueconsole - AWS Glue
To change the schema of a table, choose Edit schema to add and remove columns, change column names, and change data types. To...
Read more >
Table API - AWS Glue
This section describes data types and operations associated with tables.
Read more >
Using the Hudi framework in AWS Glue
You can use AWS Glue to perform read and write operations on Hudi tables in Amazon S3, or work with Hudi tables using...
Read more >
Delete a table from an AWS Glue Data Catalog database ...
Find the complete example and learn how to set up and run in the AWS Code Examples Repository . const deleteTable = (databaseName ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found