question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Insert into elastic search from a partitioned table throws error

See original GitHub issue

Hello,

I have noticed that Selecting data from a partitioned hive table and inserting into elastic search does not work very will and the map reduce job ends in the following error.


URL:
  http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1458893148211_0019&tipid=task_1458893148211_0019_m_000000
-----
Diagnostic Messages for this Task:
Error: java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:265)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:139)
        at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
        at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:251)
        ... 11 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
        at java.util.ArrayList.rangeCheck(ArrayList.java:635)
        at java.util.ArrayList.get(ArrayList.java:411)
        at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getProjectedGroupFields(DataWritableReadSupport.java:110)
        at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getSchemaByName(DataWritableReadSupport.java:155)
        at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:221)
        at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:256)
        at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:95)
        at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:81)
        at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)
        at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:66)
        ... 16 more


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-0: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
WARN: The method class org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked.
WARN: Please see http://www.slf4j.org/codes.html#release for an explanation.

I have tested similar scenarios by using the different source tables ( Stored as parquet, stored as parquet and snappy compressed) and it works fine. But when i use partitioned hive table as my source table, the job fails with the above error.

I have used Cloudera 5.5 VM for hadoop, elasticsearch-2.2.1 and elasticsearch-hadoop-2.2.0-rc1.jar for my tests.

I attach a zip file with two HQL scripts and the ES-Hadoop jar for reproducing this issue.

Hive-ES.zip

Thanks and Regards Sa’M

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

4reactions
lvguanmingcommented, Apr 25, 2017

My solution is create temp table . Maybe the issue is in Parquet .

  1. create temp table from select data from ‘parquet’ format table data ; CREATE TABLE temp.order_index_2016 row format delimited fields terminated by '|' STORED AS **RCFile** AS select id, userId, substr(createTime,0,19) as createTime from ods.b_order where time >='2016-01-01' and time<'2017-01-01';

  2. use temp table as source table load to ES (hive table already mapping to ES use ES-Hadoop before). insert into temp.order_index_es select id, userId, createTime from temp.order_index_2016

Above 2 step test pass … This exception doesn’t happen.

1reaction
kiddingbabycommented, Jul 24, 2017

the same question in es-hadoop-5.2.2, hive-1.2.1, but when i change the serde from parquet to orc is ok.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Not able to insert data into hive elasticsearch index using ...
I have used the following steps in hive terminal to insert into elasticsearch index -. Create hive table pointing to elasticsearch index.
Read more >
Reading and Writing documents | Elasticsearch Guide [8.5]
Every indexing operation in Elasticsearch is first resolved to a replication group using routing, typically based on the document ID. Once the replication...
Read more >
Configuration | Elasticsearch for Apache Hadoop [8.5] | Elastic
If no data is found, an exception is thrown. upsert: known as merge or insert if the data does not exist, updates if...
Read more >
Terms aggregation | Elasticsearch Guide [master] | Elastic
an upper bound of the error on the document counts for each term, see below ... had to throw away some buckets, either...
Read more >
Paginate search results | Elasticsearch Guide [8.5] | Elastic
If you need to preserve the index state while paging through more than 10,000 hits, use the search_after parameter with a point in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found