Hive loading data into ES error: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed
See original GitHub issueThanks for @costin advice , I create a individual issue
Using hive 1.2.1 loading data from hive to ES, I was succeeded to load billions of data into ES by using hive, but when I try to update the data into ES, the job will failed after several hours. Briefly, It okay to loading data into an empty index by using hive, but it will failed in the midway when updating those large scale ES data by using hive.
@costin, I read your another blog: https://discuss.elastic.co/t/spark-es-batch-write-retry-count-negative-value-is-ignored/25436/2
es.batch.write.retry.count should work. Note that the connector has two types of retries:,
I have no idea which one is the type I encountered, network hiccups or document reject. And is that If i set it to a negative number , will avoid the job stop midway?
And the same condition, full update the index is failed, but recreate the index is okay, it that update index cost more ES resources?
Follow with detail errors:
Hadoop job throws error, and stopped:
org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[10.1.23.134:9200, es.op.koudai.com:9200, 10.1.23.132:9200, 10.1.23.131:9200, 10.1.23.130:9200, 10.1.23.133:9200]]
2015-11-06 15:40:55,230 ERROR [main] org.elasticsearch.hadoop.rest.NetworkClient: Node [Read timed out] failed (10.1.23.133:9200); no other nodes left - aborting...
2015-11-06 15:40:55,259 FATAL [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"user_id":"492923825","is_register":null,"register_time":null"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[10.1.23.134:9200, es.op.koudai.com:9200, 10.1.23.132:9200, 10.1.23.131:9200, 10.1.23.130:9200, 10.1.23.133:9200]]
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:142)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:317)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:313)
at org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:150)
at org.elasticsearch.hadoop.rest.RestRepository.tryFlush(RestRepository.java:209)
at org.elasticsearch.hadoop.rest.RestRepository.flush(RestRepository.java:232)
at org.elasticsearch.hadoop.rest.RestRepository.doWriteToIndex(RestRepository.java:185)
at org.elasticsearch.hadoop.rest.RestRepository.writeProcessedToIndex(RestRepository.java:164)
at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:63)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:753)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
... 9 more
2015-11-06 15:40:55,259 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: 3 finished. closing...
2015-11-06 15:40:55,259 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: RECORDS_IN:879999
2015-11-06 15:40:55,259 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0
my mapping config is
CREATE EXTERNAL TABLE es.buyer_es (
user_id string,
is_register int,
register_time string,
xxx
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'xxx/buyers','es.nodes'= 'xxx',
'es.port'= '9200','es.mapping.id' = 'user_id','es.index.auto.create' = 'true',
'es.batch.size.entries' = '1000','es.batch.write.retry.count' = '10000','es.batch.write.retry.wait' = '10s',
'es.batch.write.refresh' = 'false','es.nodes.discovery' = 'true','es.nodes.client.only' = 'false'
);
my insert&update script is
INSERT OVERWRITE TABLE es.buyer_es
select * from xxx
Issue Analytics
- State:
- Created 8 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
ES does maintain traffic control - when overloaded it starts rejecting documents. And ES-Hadoop retries after waiting a bit, only the failed docs. However by asking to have 1000 retries one basically disregards such pushback and keeps on retrying over and over again rendering the push back void.
Note that under load, a JVM can start GC’ing a lot which effectively means the node is frozen, not responding to any network calls and thus can be interpreted as dead. Which is likely the case here - you overload the cluster, keep pushing, the nodes start GC’ing and the clients assumes they have dropped of the network.
115 tasks on a 5 ES nodes is simply way too much. The CPU is not the only param you should take into account, memory is just as important and is disk (SSDs is what you are looking for). I recommend monitoring your ES cluster closely in particular the IO and memory usage and to read the docs (including this page )and webinars on performance.
As indicated above, minimizing the number of tasks to something more like 1-2-3x the number of shards (so 15) and increasing the batch size in small steps (1.5x) is likely to yield much better results and more significantly allow the job to complete successfully.
@KrishnaShah123 Please avoid petitioning specific users for help on old issues on github. In the future, we ask that you keep these kinds of questions to the forums. We reserve github for tracking confirmed bugs and feature planning.