question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hive read issues when different partition have different schemas.

See original GitHub issue

Hive reads writer schema separately for each partition. If a schema has evolved and updates has not made for all partitions (i.e. for some partition last change was from older schema), they hive read for that partition would fail since non availability of new column in schema.

Concerned code: (Class: AbstractRealtimeRecordReader)

private void init() throws IOException {
    Schema schemaFromLogFile = LogReaderUtils.readLatestSchemaFromLogFiles(split.getBasePath(), split.getDeltaLogPaths(), jobConf);
    if (schemaFromLogFile == null) {
      writerSchema = InputSplitUtils.getBaseFileSchema((FileSplit)split, jobConf);
      LOG.info("Writer Schema From Parquet => " + writerSchema.getFields());
    } else {
      writerSchema = schemaFromLogFile;
      LOG.info("Writer Schema From Log => " + writerSchema.toString(true));
    }

I tried replacing this writer schema to get schema from TableSchemaResolver. This is working fine for me.

Does this look good? I haven’t followed hive flow in detail yet.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:19 (19 by maintainers)

github_iconTop GitHub Comments

1reaction
aditiwari01commented, Apr 26, 2021

Have quite a busy sprint. I’ll try to get back. If not, I’ll update by weekend.

0reactions
nsivabalancommented, Jan 2, 2022

able to get hold of it, thanks. will close the github issue and follow up on the patch review. thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

HIVE_PARTITION_SCHEMA_MI...
The issue is happening because of not the same column is of different type in both table and partition's metadata. It is because...
Read more >
Re: One Schema Per Partition? (Multiple schemas per table?)
I have finally gotten around to testing this functionality, and it would doesn't work. The ALTER table change columns command just changes ...
Read more >
Solving Hive Partition Schema Mismatch Errors in Athena
HIVE_PARTITION_SCHEMA_MISMATCH: There is a mismatch between the table and partition schemas. The types are incompatible and cannot be coerced.
Read more >
Working with multiple partition formats within a Hive table with ...
Creating a working example in Hive​​ CREATE DATABASE test; USE test; CREATE EXTERNAL TABLE IF NOT EXISTS events(eventType STRING, city STRING)  ...
Read more >
One Schema Per Partition? (Multiple schemas per table?)
I found a set of slides from Facebook online about Hive that claims you can have a schema per partition in the table,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found