question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"Cannot find node with id" exception even when the node is alive and cluster is green.

See original GitHub issue

I am getting the following exception when pushing data from hadoop M/R job. When this happens, the node in question is responding and cluster is also healthy (green). Also, plenty of resources on the box. CPU usage is less than 30%, free memory is over 50G. With this exception, the hadoop map task is failing and getting restarted and eventually succeeding (may be by connecting to a different ES node). These errors are not consistent. They are very intermittent.

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot find node with id [Q4pQkOIJSSi2oXRXGUVs8w]
    at org.elasticsearch.hadoop.util.Assert.notNull(Assert.java:40)
    at org.elasticsearch.hadoop.rest.RestRepository.getWriteTargetPrimaryShards(RestRepository.java:251)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.initSingleIndex(EsOutputFormat.java:218)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:201)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.write(EsOutputFormat.java:159)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
    at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at afi.search.hadoop.es.ESMapper1.map(ESMapper1.java:227)
    at afi.search.hadoop.es.ESMapper1.map(ESMapper1.java:1)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

Issue Analytics

  • State:closed
  • Created 9 years ago
  • Comments:29 (13 by maintainers)

github_iconTop GitHub Comments

2reactions
ebradshawcommented, Nov 26, 2014

Nothing of interest was showing up in the Elasticsearch master log. I didn’t check the logs on the nodes that reported the error.

I’ll have to double check Friday when I’m back at work, but I believe I upgraded from 7u51 to 8u20 (maybe 8u25).

The exception was consistent on repeat runs of the job with the same data. Shutting down the E-search nodes that were failing to connect appeared to resolve the problem.

I’ll turn up the logging on Friday and report back.

On Wed, Nov 26, 2014 at 4:23 PM, Costin Leau notifications@github.com wrote:

@ebradshaw https://github.com/ebradshaw anything showing up in the logs? What are the exact JDK versions in place (what updates)? Does the exception occurs consistently or not? Can you please turn on logging on TRACE level and fire up the job and report back? Thanks!

— Reply to this email directly or view it on GitHub https://github.com/elasticsearch/elasticsearch-hadoop/issues/243#issuecomment-64712579 .

1reaction
ebradshawcommented, Nov 26, 2014

I’m having the same issue on a 20 node Elasticsearch cluster. It seems to have started after I updated my Elasticsearch cluster from JDK 1.7 to JDK 1.8. When I run a load job via Elasticsearch-Spark, several ‘Cannot find node with id …’ errors occur. The same nodes report problems on repeat runs of the same job. If I go ahead and shut those few nodes down and run the job again, it seems to run error free. If I restart the entire cluster, the spark job complains about different nodes.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How do I resolve "Cannot find module" error using Node.js?
Using npm install installs the module into the current directory only (in a subdirectory called node_modules ). Is app.js located under ...
Read more >
Troubleshooting Amazon OpenSearch Service
A yellow cluster status means the primary shards for all indexes are allocated to nodes in a cluster, but the replica shards for...
Read more >
Add and remove nodes in your cluster | Elasticsearch Guide [8.5]
When all primary and replica shards are active, the cluster state changes to green. A cluster with three nodes. Enroll nodes in an...
Read more >
Configuring and managing high availability clusters Red Hat ...
When a cluster node does not function as it should or loses communication with ... node on which it had been running and...
Read more >
API Documentation — Elasticsearch 7.16.0 documentation
The instance has attributes cat , cluster , indices , ingest , nodes ... (missing or closed); keep_alive – Specific the time to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found