"Cannot find node with id" exception even when the node is alive and cluster is green.
See original GitHub issueI am getting the following exception when pushing data from hadoop M/R job. When this happens, the node in question is responding and cluster is also healthy (green). Also, plenty of resources on the box. CPU usage is less than 30%, free memory is over 50G. With this exception, the hadoop map task is failing and getting restarted and eventually succeeding (may be by connecting to a different ES node). These errors are not consistent. They are very intermittent.
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot find node with id [Q4pQkOIJSSi2oXRXGUVs8w]
at org.elasticsearch.hadoop.util.Assert.notNull(Assert.java:40)
at org.elasticsearch.hadoop.rest.RestRepository.getWriteTargetPrimaryShards(RestRepository.java:251)
at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.initSingleIndex(EsOutputFormat.java:218)
at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:201)
at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.write(EsOutputFormat.java:159)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at afi.search.hadoop.es.ESMapper1.map(ESMapper1.java:227)
at afi.search.hadoop.es.ESMapper1.map(ESMapper1.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Issue Analytics
- State:
- Created 9 years ago
- Comments:29 (13 by maintainers)
Top Results From Across the Web
How do I resolve "Cannot find module" error using Node.js?
Using npm install installs the module into the current directory only (in a subdirectory called node_modules ). Is app.js located under ...
Read more >Troubleshooting Amazon OpenSearch Service
A yellow cluster status means the primary shards for all indexes are allocated to nodes in a cluster, but the replica shards for...
Read more >Add and remove nodes in your cluster | Elasticsearch Guide [8.5]
When all primary and replica shards are active, the cluster state changes to green. A cluster with three nodes. Enroll nodes in an...
Read more >Configuring and managing high availability clusters Red Hat ...
When a cluster node does not function as it should or loses communication with ... node on which it had been running and...
Read more >API Documentation — Elasticsearch 7.16.0 documentation
The instance has attributes cat , cluster , indices , ingest , nodes ... (missing or closed); keep_alive – Specific the time to...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Nothing of interest was showing up in the Elasticsearch master log. I didn’t check the logs on the nodes that reported the error.
I’ll have to double check Friday when I’m back at work, but I believe I upgraded from 7u51 to 8u20 (maybe 8u25).
The exception was consistent on repeat runs of the job with the same data. Shutting down the E-search nodes that were failing to connect appeared to resolve the problem.
I’ll turn up the logging on Friday and report back.
On Wed, Nov 26, 2014 at 4:23 PM, Costin Leau notifications@github.com wrote:
I’m having the same issue on a 20 node Elasticsearch cluster. It seems to have started after I updated my Elasticsearch cluster from JDK 1.7 to JDK 1.8. When I run a load job via Elasticsearch-Spark, several ‘Cannot find node with id …’ errors occur. The same nodes report problems on repeat runs of the same job. If I go ahead and shut those few nodes down and run the job again, it seems to run error free. If I restart the entire cluster, the spark job complains about different nodes.