Hive read issues when different partition have different schemas.
See original GitHub issueHive reads writer schema separately for each partition. If a schema has evolved and updates has not made for all partitions (i.e. for some partition last change was from older schema), they hive read for that partition would fail since non availability of new column in schema.
Concerned code: (Class: AbstractRealtimeRecordReader
)
private void init() throws IOException {
Schema schemaFromLogFile = LogReaderUtils.readLatestSchemaFromLogFiles(split.getBasePath(), split.getDeltaLogPaths(), jobConf);
if (schemaFromLogFile == null) {
writerSchema = InputSplitUtils.getBaseFileSchema((FileSplit)split, jobConf);
LOG.info("Writer Schema From Parquet => " + writerSchema.getFields());
} else {
writerSchema = schemaFromLogFile;
LOG.info("Writer Schema From Log => " + writerSchema.toString(true));
}
I tried replacing this writer schema to get schema from TableSchemaResolver. This is working fine for me.
Does this look good? I haven’t followed hive flow in detail yet.
Issue Analytics
- State:
- Created 2 years ago
- Comments:19 (19 by maintainers)
Top Results From Across the Web
HIVE_PARTITION_SCHEMA_MI...
The issue is happening because of not the same column is of different type in both table and partition's metadata. It is because...
Read more >Re: One Schema Per Partition? (Multiple schemas per table?)
I have finally gotten around to testing this functionality, and it would doesn't work. The ALTER table change columns command just changes ...
Read more >Solving Hive Partition Schema Mismatch Errors in Athena
HIVE_PARTITION_SCHEMA_MISMATCH: There is a mismatch between the table and partition schemas. The types are incompatible and cannot be coerced.
Read more >Working with multiple partition formats within a Hive table with ...
Creating a working example in Hive CREATE DATABASE test; USE test; CREATE EXTERNAL TABLE IF NOT EXISTS events(eventType STRING, city STRING) ...
Read more >One Schema Per Partition? (Multiple schemas per table?)
I found a set of slides from Facebook online about Hive that claims you can have a schema per partition in the table,...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Have quite a busy sprint. I’ll try to get back. If not, I’ll update by weekend.
able to get hold of it, thanks. will close the github issue and follow up on the patch review. thanks!