[SUPPORT] Reconcile schema - missing field dropped from metadata
See original GitHub issueDescribe the problem you faced I’m using schema on read (full schema evolution feature) and reconcile schema feature to evolve hudi table schema, it’s synchronized with Glue Data Catalog. COW table.
I add a column (col_a) in the middle of the table in one batch (upsert operation). In the next batch (upsert) I add new column at the end of the table (col_b) but col_a is missing in data frame. Then I query the table via Athena or via Spark SQL, then col_a is dropped and not visible.
I can upsert next batch with df that contains both col_a and col_b, then all data is visible in Spark and Athena.
I would expect that during the schema reconciliation phase Hudi would handle this case and preserve col_a with a null value.
To Reproduce
Steps to reproduce the behavior:
edit: I used dataFrame api to upsert data into the hudi table
Operations, step by step
Batch seq | Operation | DF schema | Table Schema | Expected Table Schema |
---|---|---|---|---|
0 | insert | col_1: string,col_2: string | col_1: string,col_2: string | col_1: string,col_2: string |
1 | upsert | col_1: string, col_a: string, col_2: string | col_1: string,col_a: string,col_2: string | col_1: string,col_a: string,col_2: string |
2 | upsert | col_1: string, col_2: string, col_b: string | col_1: string, col_2: string, col_b: string | col_1: string, col_a: string, col_2: string, col_b: string |
Expected behavior
In batch 2 table should have schema: col_1: string, col_a: string, col_2: string, col_b: string
with col_a preserved with null values where column is missing
Environment Description
-
Hudi version : 0.11.0 OSS
-
Spark version : 3.2.0-amzn
-
Hive version : 3.2.1
-
Hadoop version : 3.2.1
-
Storage (HDFS/S3/GCS…) : S3
-
Running on Docker? (yes/no) : yes/ emr on eks 6.6
Additional context
Stacktrace
Add the stacktrace of the error.
Issue Analytics
- State:
- Created a year ago
- Comments:8 (8 by maintainers)
Top GitHub Comments
@kazdy ok, Thank you for your answer, let me fix this problem in next few days
@xiarixiaoyao Sounds good. Closing this ticket. I’ve made it a blocker of 0.12 release.