question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hive loading data into ES error: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed

See original GitHub issue

Thanks for @costin advice , I create a individual issue

Using hive 1.2.1 loading data from hive to ES, I was succeeded to load billions of data into ES by using hive, but when I try to update the data into ES, the job will failed after several hours. Briefly, It okay to loading data into an empty index by using hive, but it will failed in the midway when updating those large scale ES data by using hive.

@costin, I read your another blog: https://discuss.elastic.co/t/spark-es-batch-write-retry-count-negative-value-is-ignored/25436/2

es.batch.write.retry.count should work. Note that the connector has two types of retries:,

I have no idea which one is the type I encountered, network hiccups or document reject. And is that If i set it to a negative number , will avoid the job stop midway?

And the same condition, full update the index is failed, but recreate the index is okay, it that update index cost more ES resources?

Follow with detail errors:

Hadoop job throws error, and stopped:

org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[10.1.23.134:9200, es.op.koudai.com:9200, 10.1.23.132:9200, 10.1.23.131:9200, 10.1.23.130:9200, 10.1.23.133:9200]]

2015-11-06 15:40:55,230 ERROR [main] org.elasticsearch.hadoop.rest.NetworkClient: Node [Read timed out] failed (10.1.23.133:9200); no other nodes left - aborting...
2015-11-06 15:40:55,259 FATAL [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"user_id":"492923825","is_register":null,"register_time":null"}
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518)
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[10.1.23.134:9200, es.op.koudai.com:9200, 10.1.23.132:9200, 10.1.23.131:9200, 10.1.23.130:9200, 10.1.23.133:9200]] 
    at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:142)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:317)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:313)
    at org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:150)
    at org.elasticsearch.hadoop.rest.RestRepository.tryFlush(RestRepository.java:209)
    at org.elasticsearch.hadoop.rest.RestRepository.flush(RestRepository.java:232)
    at org.elasticsearch.hadoop.rest.RestRepository.doWriteToIndex(RestRepository.java:185)
    at org.elasticsearch.hadoop.rest.RestRepository.writeProcessedToIndex(RestRepository.java:164)
    at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:63)
    at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:753)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
    at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
    at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
    at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
    ... 9 more

2015-11-06 15:40:55,259 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: 3 finished. closing... 
2015-11-06 15:40:55,259 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: RECORDS_IN:879999
2015-11-06 15:40:55,259 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0

my mapping config is

CREATE EXTERNAL TABLE es.buyer_es (
  user_id  string,
  is_register int,
  register_time  string,
  xxx
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'xxx/buyers','es.nodes'= 'xxx',
              'es.port'= '9200','es.mapping.id' = 'user_id','es.index.auto.create' = 'true',
              'es.batch.size.entries' = '1000','es.batch.write.retry.count' = '10000','es.batch.write.retry.wait' = '10s',
              'es.batch.write.refresh' = 'false','es.nodes.discovery' = 'true','es.nodes.client.only' = 'false'
             );

my insert&update script is

INSERT OVERWRITE TABLE es.buyer_es 
select * from xxx

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
costincommented, Nov 15, 2015

ES does maintain traffic control - when overloaded it starts rejecting documents. And ES-Hadoop retries after waiting a bit, only the failed docs. However by asking to have 1000 retries one basically disregards such pushback and keeps on retrying over and over again rendering the push back void.

Note that under load, a JVM can start GC’ing a lot which effectively means the node is frozen, not responding to any network calls and thus can be interpreted as dead. Which is likely the case here - you overload the cluster, keep pushing, the nodes start GC’ing and the clients assumes they have dropped of the network.

115 tasks on a 5 ES nodes is simply way too much. The CPU is not the only param you should take into account, memory is just as important and is disk (SSDs is what you are looking for). I recommend monitoring your ES cluster closely in particular the IO and memory usage and to read the docs (including this page )and webinars on performance.

As indicated above, minimizing the number of tasks to something more like 1-2-3x the number of shards (so 15) and increasing the batch size in small steps (1.5x) is likely to yield much better results and more significantly allow the job to complete successfully.

1reaction
jbaieracommented, Apr 19, 2018

@KrishnaShah123 Please avoid petitioning specific users for help on old issues on github. In the future, we ask that you keep these kinds of questions to the forums. We reserve github for tracking confirmed bugs and feature planning.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Connection error (check network and/or proxy settings)- all ...
Hi, I am trying to dump from Hadoop to ES using elasticsearch-hadoop. I tested Java code in my local and it is working...
Read more >
All Nodes Failed Exception - Elasticsearch - Elastic Discuss
Hi, We are getting the following error: Error summary: EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- ...
Read more >
Connecting from Spark to ElasticSearch using Hadoop not ...
java:61) at org.elasticsearch.hadoop.mr.EsInputFormat.getSplits(EsInputFormat.java:434) It seems like it cannot find any nodes at all either.
Read more >
Solved: insert into Elasticsearch failing - Cloudera Community
HiveException : org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes ...
Read more >
Connection error (check network and/or proxy settings)- all ...
In a Databricks Job, we have a UBQ with a Painless script for ES. these are ... Connection error (check network and/or proxy...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found