Insert into elastic search from a partitioned table throws error
See original GitHub issueHello,
I have noticed that Selecting data from a partitioned hive table and inserting into elastic search does not work very will and the map reduce job ends in the following error.
URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1458893148211_0019&tipid=task_1458893148211_0019_m_000000
-----
Diagnostic Messages for this Task:
Error: java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:265)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:139)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:251)
... 11 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getProjectedGroupFields(DataWritableReadSupport.java:110)
at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getSchemaByName(DataWritableReadSupport.java:155)
at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:221)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:256)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:95)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:81)
at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)
at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:66)
... 16 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
WARN: The method class org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked.
WARN: Please see http://www.slf4j.org/codes.html#release for an explanation.
I have tested similar scenarios by using the different source tables ( Stored as parquet, stored as parquet and snappy compressed) and it works fine. But when i use partitioned hive table as my source table, the job fails with the above error.
I have used Cloudera 5.5 VM for hadoop, elasticsearch-2.2.1 and elasticsearch-hadoop-2.2.0-rc1.jar for my tests.
I attach a zip file with two HQL scripts and the ES-Hadoop jar for reproducing this issue.
Thanks and Regards Sa’M
Issue Analytics
- State:
- Created 7 years ago
- Comments:7 (2 by maintainers)
Top Results From Across the Web
Not able to insert data into hive elasticsearch index using ...
I have used the following steps in hive terminal to insert into elasticsearch index -. Create hive table pointing to elasticsearch index.
Read more >Reading and Writing documents | Elasticsearch Guide [8.5]
Every indexing operation in Elasticsearch is first resolved to a replication group using routing, typically based on the document ID. Once the replication...
Read more >Configuration | Elasticsearch for Apache Hadoop [8.5] | Elastic
If no data is found, an exception is thrown. upsert: known as merge or insert if the data does not exist, updates if...
Read more >Terms aggregation | Elasticsearch Guide [master] | Elastic
an upper bound of the error on the document counts for each term, see below ... had to throw away some buckets, either...
Read more >Paginate search results | Elasticsearch Guide [8.5] | Elastic
If you need to preserve the index state while paging through more than 10,000 hits, use the search_after parameter with a point in...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
My solution is create temp table . Maybe the issue is in
Parquet
.create temp table from select data from ‘parquet’ format table data ;
CREATE TABLE temp.order_index_2016 row format delimited fields terminated by '|' STORED AS **RCFile** AS select id, userId, substr(createTime,0,19) as createTime from ods.b_order where time >='2016-01-01' and time<'2017-01-01';
use temp table as source table load to ES (hive table already mapping to ES use ES-Hadoop before).
insert into temp.order_index_es select id, userId, createTime from temp.order_index_2016
Above 2 step test pass … This exception doesn’t happen.
the same question in es-hadoop-5.2.2, hive-1.2.1, but when i change the serde from parquet to orc is ok.