Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Insert into elastic search from a partitioned table throws error

See original GitHub issue

Hello,

I have noticed that Selecting data from a partitioned hive table and inserting into elastic search does not work very will and the map reduce job ends in the following error.


URL:
  http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1458893148211_0019&tipid=task_1458893148211_0019_m_000000
-----
Diagnostic Messages for this Task:
Error: java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:265)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:139)
        at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
        at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:251)
        ... 11 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
        at java.util.ArrayList.rangeCheck(ArrayList.java:635)
        at java.util.ArrayList.get(ArrayList.java:411)
        at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getProjectedGroupFields(DataWritableReadSupport.java:110)
        at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getSchemaByName(DataWritableReadSupport.java:155)
        at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:221)
        at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:256)
        at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:95)
        at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:81)
        at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)
        at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:66)
        ... 16 more


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-0: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
WARN: The method class org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked.
WARN: Please see http://www.slf4j.org/codes.html#release for an explanation.

I have tested similar scenarios by using the different source tables ( Stored as parquet, stored as parquet and snappy compressed) and it works fine. But when i use partitioned hive table as my source table, the job fails with the above error.

I have used Cloudera 5.5 VM for hadoop, elasticsearch-2.2.1 and elasticsearch-hadoop-2.2.0-rc1.jar for my tests.

I attach a zip file with two HQL scripts and the ES-Hadoop jar for reproducing this issue.

Hive-ES.zip

Thanks and Regards Sa’M

Issue Analytics

State:
Created 7 years ago
Comments:7 (2 by maintainers)

Top GitHub Comments

4reactions

lvguanmingcommented, Apr 25, 2017

My solution is create temp table . Maybe the issue is in Parquet .

create temp table from select data from ‘parquet’ format table data ; CREATE TABLE temp.order_index_2016 row format delimited fields terminated by '|' STORED AS **RCFile** AS select id, userId, substr(createTime,0,19) as createTime from ods.b_order where time >='2016-01-01' and time<'2017-01-01';
use temp table as source table load to ES (hive table already mapping to ES use ES-Hadoop before). insert into temp.order_index_es select id, userId, createTime from temp.order_index_2016

Above 2 step test pass … This exception doesn’t happen.

1reaction

kiddingbabycommented, Jul 24, 2017

the same question in es-hadoop-5.2.2, hive-1.2.1, but when i change the serde from parquet to orc is ok.

Top Results From Across the Web

Not able to insert data into hive elasticsearch index using ...

I have used the following steps in hive terminal to insert into elasticsearch index -. Create hive table pointing to elasticsearch index.

Reading and Writing documents | Elasticsearch Guide [8.5]

Every indexing operation in Elasticsearch is first resolved to a replication group using routing, typically based on the document ID. Once the replication...

Configuration | Elasticsearch for Apache Hadoop [8.5] | Elastic

If no data is found, an exception is thrown. upsert: known as merge or insert if the data does not exist, updates if...

Terms aggregation | Elasticsearch Guide [master] | Elastic

an upper bound of the error on the document counts for each term, see below ... had to throw away some buckets, either...

Paginate search results | Elasticsearch Guide [8.5] | Elastic

If you need to preserve the index state while paging through more than 10,000 hits, use the search_after parameter with a point in...