Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"Cannot find node with id" exception even when the node is alive and cluster is green.

See original GitHub issue

I am getting the following exception when pushing data from hadoop M/R job. When this happens, the node in question is responding and cluster is also healthy (green). Also, plenty of resources on the box. CPU usage is less than 30%, free memory is over 50G. With this exception, the hadoop map task is failing and getting restarted and eventually succeeding (may be by connecting to a different ES node). These errors are not consistent. They are very intermittent.

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot find node with id [Q4pQkOIJSSi2oXRXGUVs8w]
    at org.elasticsearch.hadoop.util.Assert.notNull(Assert.java:40)
    at org.elasticsearch.hadoop.rest.RestRepository.getWriteTargetPrimaryShards(RestRepository.java:251)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.initSingleIndex(EsOutputFormat.java:218)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:201)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.write(EsOutputFormat.java:159)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
    at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at afi.search.hadoop.es.ESMapper1.map(ESMapper1.java:227)
    at afi.search.hadoop.es.ESMapper1.map(ESMapper1.java:1)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

Issue Analytics

State:
Created 9 years ago
Comments:29 (13 by maintainers)

Top GitHub Comments

2reactions

ebradshawcommented, Nov 26, 2014

Nothing of interest was showing up in the Elasticsearch master log. I didn’t check the logs on the nodes that reported the error.

I’ll have to double check Friday when I’m back at work, but I believe I upgraded from 7u51 to 8u20 (maybe 8u25).

The exception was consistent on repeat runs of the job with the same data. Shutting down the E-search nodes that were failing to connect appeared to resolve the problem.

I’ll turn up the logging on Friday and report back.

On Wed, Nov 26, 2014 at 4:23 PM, Costin Leau notifications@github.com wrote:

@ebradshaw https://github.com/ebradshaw anything showing up in the logs? What are the exact JDK versions in place (what updates)? Does the exception occurs consistently or not? Can you please turn on logging on TRACE level and fire up the job and report back? Thanks!

— Reply to this email directly or view it on GitHub https://github.com/elasticsearch/elasticsearch-hadoop/issues/243#issuecomment-64712579 .

1reaction

ebradshawcommented, Nov 26, 2014

I’m having the same issue on a 20 node Elasticsearch cluster. It seems to have started after I updated my Elasticsearch cluster from JDK 1.7 to JDK 1.8. When I run a load job via Elasticsearch-Spark, several ‘Cannot find node with id …’ errors occur. The same nodes report problems on repeat runs of the same job. If I go ahead and shut those few nodes down and run the job again, it seems to run error free. If I restart the entire cluster, the spark job complains about different nodes.