[SUPPORT] unhelpful error message when there are parquets outside table base path
See original GitHub issueusing hoodie 0.4.6 and spark 2.3.4
run below in hiveserver2 (v2.3.4):
CREATE EXTERNAL TABLE `someschema.mytbl`(
col1 string,
col2 string,
col3 string)
PARTITIONED BY (
`mydate` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'com.uber.hoodie.hadoop.HoodieInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
's3a://redact/M5/table/mytbl'
#use spark to create COW hudi parquet under s3://redact/M5/table/mytbl/2016/11/07/ and s3://redact/M/table/mytbl/2019/12/01/
run below in hiveserver2:
ALTER TABLE someschema.mytbl ADD IF NOT EXISTS PARTITION(mydate='2016-11-07')
LOCATION 's3a://redact/M5/table/mytbl/2016/11/07/'
ALTER TABLE someschema.mytbl ADD IF NOT EXISTS PARTITION(mydate='2019-12-01')
LOCATION 's3a://redact/M/table/mytbl/2019/12/01/'
hive metastore shows below 2 rows:
select TBLS.TBL_NAME,PARTITIONS.PART_NAME,SDS.LOCATION
from SDS,TBLS,PARTITIONS
where PARTITIONS.SD_ID = SDS.SD_ID
and TBLS.TBL_ID=PARTITIONS.TBL_ID
and TBLS.TBL_NAME = 'mytbl'
order by 1,2;
mytbl mydate=2016-11-07 s3a://redact/M5/table/mytbl/2016/11/07
mytbl mydate=2019-12-01 s3a://redact/M/table/mytbl/2019/12/01
query1:
select count(1) from someschema.mytbl where mydate = '2016-11-07'
works fine from both hiveserver2 and presto
query2:
select count(1) from someschema.mytbl where mydate = '2019-12-01'
presto gives unhelpful error:
io.prestosql.spi.PrestoException: HIVE_UNKNOWN_ERROR
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:223)
at io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
at io.prestosql.$gen.Presto_ff748c3_dirty____20200610_171635_2.run(Unknown Source)
at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ArrayIndexOutOfBoundsException: undefined
hiveserver2 gives more verbose yet still not too helpful error:
2020-06-12T18:22:23,375 WARN [HiveServer2-Handler-Pool: Thread-12109] thrift.ThriftCLIService: Error fetching results:
org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 2
at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:499) ~[hive-service-2.3.4.jar:2.3.4]
at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:307) ~[hive-service-2.3.4.jar:2.3.4]
at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:878) ~[hive-service-2.3.4.jar:2.3.4]
at sun.reflect.GeneratedMethodAccessor135.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_252]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_252]
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) ~[hive-service-2.3.4.jar:2.3.4]
at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) ~[hive-service-2.3.4.jar:2.3.4]
at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) ~[hive-service-2.3.4.jar:2.3.4]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_252]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_252]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) ~[hadoop-common-2.8.5.jar:?]
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) ~[hive-service-2.3.4.jar:2.3.4]
at com.sun.proxy.$Proxy42.fetchResults(Unknown Source) ~[?:?]
at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:559) ~[hive-service-2.3.4.jar:2.3.4]
at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:751) ~[hive-service-2.3.4.jar:2.3.4]
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1717) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1702) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) ~[hive-service-2.3.4.jar:2.3.4]
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) ~[hive-exec-2.3.4.jar:2.3.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 2
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:494) ~[hive-service-2.3.4.jar:2.3.4]
... 24 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
at com.uber.hoodie.common.util.FSUtils.getCommitTime(FSUtils.java:120) ~[hoodiebundle.jar:?]
at com.uber.hoodie.common.model.HoodieDataFile.getCommitTime(HoodieDataFile.java:37) ~[hoodiebundle.jar:?]
at com.uber.hoodie.common.model.HoodieFileGroup.addDataFile(HoodieFileGroup.java:89) ~[hoodiebundle.jar:?]
at com.uber.hoodie.common.table.view.HoodieTableFileSystemView.lambda$null$3(HoodieTableFileSystemView.java:155) ~[hoodiebundle.jar:?]
at java.util.ArrayList.forEach(ArrayList.java:1257) ~[?:1.8.0_252]
at com.uber.hoodie.common.table.view.HoodieTableFileSystemView.lambda$addFilesToView$5(HoodieTableFileSystemView.java:155) ~[hoodiebundle.jar:?]
at java.lang.Iterable.forEach(Iterable.java:75) ~[?:1.8.0_252]
at com.uber.hoodie.common.table.view.HoodieTableFileSystemView.addFilesToView(HoodieTableFileSystemView.java:151) ~[hoodiebundle.jar:?]
at com.uber.hoodie.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:107) ~[hoodiebundle.jar:?]
at com.uber.hoodie.hadoop.HoodieInputFormat.listStatus(HoodieInputFormat.java:88) ~[hoodiebundle.jar:?]
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:322) ~[hadoop-mapreduce-client-core-2.8.5.jar:?]
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:372) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:304) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:494) ~[hive-service-2.3.4.jar:2.3.4]
... 24 more
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (5 by maintainers)
Top Results From Across the Web
Solved: external table stored as parquet - can not use fie...
We have parquet fields with relatively deep nested structure (up to 4-5 levels) and map them to external tables in hive/impala.
Read more >Unable to infer schema when loading Parquet file
This error usually occurs when you try to read an empty directory as parquet. Probably your outcome Dataframe is empty.
Read more >Why do I always get an error on querying the Parquet table
The issue can happen if the Hive syntax for table creation is used instead of the Spark syntax. Read more here: https://docs.databricks.com/spark/latest/spark ...
Read more >Using the Parquet File Format with Impala Tables
Parquet files produced outside of Impala must write column data in the same order as the columns are declared in the Impala table....
Read more >Troubleshoot the Parquet format connector - Azure Data ...
No enum constant · Symptoms: Error message occurred when you copy data to Parquet format: java.lang. · Cause: The issue could be caused...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
prestosql 336 with hudi 0.5.3 gives better error:
after putting a log statement for fullFileName i see the value is part-00007-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet, while for a table that can be queried fullFileName is 4b37466c-8b75-458e-ba28-1e0f4c350dbe_0_20200324151845.parquet
s3 listing under partition folder of table that works (there is .hoodie/ folder under base table path): 2020-03-24 15:18:55 93 .hoodie_partition_metadata 2020-03-24 15:18:57 2194374 4b37466c-8b75-458e-ba28-1e0f4c350dbe_0_20200324151845.parquet
s3 listing under partition folder of table that gets the error (there is .hoodie/ folder under base table path): 2020-03-24 15:18:44 0 _SUCCESS 2020-03-24 15:18:37 10649992 part-00000-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet 2020-03-24 15:18:38 8787785 part-00001-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet 2020-03-24 15:18:39 9562198 part-00002-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet 2020-03-24 15:18:40 9359329 part-00003-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet 2020-03-24 15:18:41 10519118 part-00004-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet 2020-03-24 15:18:42 10452807 part-00005-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet 2020-03-24 15:18:42 9104366 part-00006-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet 2020-03-24 15:18:43 9016423 part-00007-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
UPDATE This is really old table, and got corrupted along the way. After removing .hoodie/ folder select works ok
yes… makes sense… closing this issue