question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] Exception while Querying Hive _rt table

See original GitHub issue

Describe the problem you faced

I am using Spark DF to persist Hudi Table and Hive sync is enabled. But when i query *_ro table all works fine but *_rt table is not working and giving exception.

  • I am using custom class to do preCombine and combineAndUpdateValue` , so I have included my jar file in ${Hive}/lib folder

  • Also, tried to set conf in a Hive session set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; and set hive.fetch.task.conversion=none;

Hive - 2.3.7 Spark - 2 hudi-hadoop-mr-bundle-0.6.0.jar Hudi - 0.6.0

Actual Exception -> Caused by: java.lang.ClassCastException: org.apache.hudi.org.apache.avro.generic.GenericData$Record cannot be cast to org.apache.avro.generic.GenericRecord

CREATE EXTERNAL TABLE `bhuvan_123_ro`(
  `_hoodie_commit_time` string, 
  `_hoodie_commit_seqno` string, 
  `_hoodie_record_key` string, 
  `_hoodie_partition_path` string, 
  `_hoodie_file_name` string, 
  `ts_ms` bigint, 
  `pincode` double, 
  `image_link` string, 
  `_id` string, 
  `op` string, 
  `a` string, 
  `b` string, 
  `c` string, 
  `d` string, 
  `e` double)
PARTITIONED BY ( 
  `db_name` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'file:/tmp/test/hudi-user-data/MOE_PRODUCT_INFO.bhuvan_123'
TBLPROPERTIES (
  'last_commit_time_sync'='20201010202918', 
  'transient_lastDdlTime'='1602341935')
Time taken: 0.192 seconds, Fetched: 29 row(s)

Exception-

org.apache.hudi.exception.HoodieException: Unable to instantiate payload class 
	at org.apache.hudi.common.util.ReflectionUtils.loadPayload(ReflectionUtils.java:78) ~[hudi-hadoop-mr-bundle-0.6.0.jar:0.6.0]
	at org.apache.hudi.common.util.SpillableMapUtils.convertToHoodieRecordPayload(SpillableMapUtils.java:116) ~[hudi-hadoop-mr-bundle-0.6.0.jar:0.6.0]
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processDataBlock(AbstractHoodieLogRecordScanner.java:277) ~[hudi-hadoop-mr-bundle-0.6.0.jar:0.6.0]
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processQueuedBlocksForInstant(AbstractHoodieLogRecordScanner.java:306) ~[hudi-hadoop-mr-bundle-0.6.0.jar:0.6.0]
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:239) ~[hudi-hadoop-mr-bundle-0.6.0.jar:0.6.0]
	at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.<init>(HoodieMergedLogRecordScanner.java:81) ~[hudi-hadoop-mr-bundle-0.6.0.jar:0.6.0]
	at org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.getMergedLogRecordScanner(RealtimeCompactedRecordReader.java:76) ~[hudi-hadoop-mr-bundle-0.6.0.jar:0.6.0]
	at org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.<init>(RealtimeCompactedRecordReader.java:55) ~[hudi-hadoop-mr-bundle-0.6.0.jar:0.6.0]
	at org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:70) ~[hudi-hadoop-mr-bundle-0.6.0.jar:0.6.0]
	at org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.<init>(HoodieRealtimeRecordReader.java:47) ~[hudi-hadoop-mr-bundle-0.6.0.jar:0.6.0]
	at org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:186) ~[hudi-hadoop-mr-bundle-0.6.0.jar:0.6.0]
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:376) ~[hive-exec-2.3.7.jar:2.3.7]
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169) ~[hadoop-mapreduce-client-core-2.10.0.jar:?]
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:438) ~[hadoop-mapreduce-client-core-2.10.0.jar:?]
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) ~[hadoop-mapreduce-client-core-2.10.0.jar:?]
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270) ~[hadoop-mapreduce-client-common-2.10.0.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_222]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_222]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_222]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_222]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_222]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_222]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_222]
	at org.apache.hudi.common.util.ReflectionUtils.loadPayload(ReflectionUtils.java:76) ~[hudi-hadoop-mr-bundle-0.6.0.jar:0.6.0]
	... 20 more
Caused by: java.lang.ClassCastException: org.apache.hudi.org.apache.avro.generic.GenericData$Record cannot be cast to org.apache.avro.generic.GenericRecord
	at com.moengage.dpm.jobs.MergeHudiPayload.<init>(MergeHudiPayload.java:41) ~[dpm-feed-spark-jobs-1.0.10-rc0.jar:?]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_222]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_222]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_222]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_222]
	at org.apache.hudi.common.util.ReflectionUtils.loadPayload(ReflectionUtils.java:76) ~[hudi-hadoop-mr-bundle-0.6.0.jar:0.6.0]

Line Where Exception is thrown-

public MergeHudiPayload(Option<GenericRecord> record) {
        this(record.isPresent() ? record.get() : null, (record1) -> 0); // natural order
    }

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
bvaradarcommented, Oct 13, 2020

@tandonraghav : Yes, you need to shade the jar containing the custom record payload. Here is some context http://hudi.apache.org/releases.html#release-highlights-1

Look for section starting with…

With 0.5.1, hudi-hadoop-mr-bundle which is used by query engines such as presto and hive includes shaded avro package to support hudi real time queries through these

More Context: https://issues.apache.org/jira/browse/HUDI-519

0reactions
tandonraghavcommented, Oct 13, 2020

@bvaradar Thanks for the help. I am able to resolve it by putting the shaded jar. I feel it should be documented well.

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hudi-considerations.html & https://hudi.apache.org/docs/querying_data.html

Read more comments on GitHub >

github_iconTop Results From Across the Web

Error while running query on HIVE; - Cloudera Community
FAILED: SemanticException Unable to fetch table sequence_table. Exception thrown when executing query : SELECT DISTINCT 'org.apache.hadoop.hive.metastore.model.
Read more >
[#HUDI-467] Query RT Table in Hive found java.lang ... - Apache
Query RT Table in Hive found java.lang.NoClassDefFoundError Exception. Status: Assignee: Priority: Resolution: Closed. cdmikechen.
Read more >
Troubleshooting Errors and Exceptions in Hive Jobs
This topic provides information about the errors and exceptions that you might encounter when running Hive jobs or applications.
Read more >
Hive translator - select distinc count(*) is not supported
Hive does not support query like SELECT DISTINCT COUNT(*) ... Example query: SELECT DISTINCT COUNT(*) FROM Source.SmallA AS g_0 Exception: 0:23:47,771 WARN ...
Read more >
How do I resolve "OutOfMemoryError" Hive Java heap space ...
The OutOfMemoryError exception usually happens during INSERT OVERWRITE commands when there's not enough heap space on hive-server2, the Hive ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found