question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Transport response handler not found

See original GitHub issue

CrateDB version: 4.1.2

Environment description:

Centos 7.latest OpenJDK 11.0.7 Data nodes: 8cpu, 64gb ram Node makeup: 48 data, 3 master, 2 ingest, 2 query While we have 48 data nodes, it’s essentially over 2 availability zones (24 per zone)

Problem description: Our cluster health will occasionally get stuck in yellow and will require us to restart crate on the affected nodes for the health to go back to green. We typically have a nagios check that runs the alter cluster command which ends up resolving the problem, however, there are cases that require manual intervention.

We typically see shards stay unassigned until we run ALTER CLUSTER REROUTE RETRY FAILED. Some logs from a related issue #9748

shard has exceeded the maximum number of retries [20] on failed allocation attempts - manually execute 'alter cluster....' [unassigned_info[[reason=ALLOCATION_FAILED], at ..... failed to create shard, failure IOException[failed to obtain in-memory shard lock]...


[WARN ][o.e.i.c.IndicesClusterStateService] [hostname][[namespace..partitioned.tablename.someuuid][1]] marking and sending shard failed due to [failed to create shard] java.io.IOException: failed to obtain in-memory shard lock
    at org.elasticsearch.index.IndexService.createShard(IndexService.java:358)
    at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:440)
    at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:112)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:551)
...


[INFO ][o.e.i.s.TransportNodesListShardStoreMetaData] [hostname][namespace..partitioned.tablename.someuuid][1]: failed to obtain shard lock
org.elasticsearch.env.ShardLockObtainFailedException: [namespace..partitioned.tablename.someuuid][1]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [read metadata snapshot]
    at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:748)
    at org.elasticsearch.NodeEnvironment.shardLock(NodeEnvironment.java:663)
    at org.elasticsearch.index.Store.readMetadataSnapshot(Store.java:443)
....

AFTER running the retry command we get shards stuck in the RELOCATING state with the following log message that emits at a very fast rate:

[WARN ][o.e.t.TransportService][node]Transport response handler not found of id [9285317]

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:17 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
seutcommented, Nov 23, 2020

We’ve finally found the issue related to the Transport handler not found ... log entries, see https://github.com/crate/crate/pull/10797. Thank you for reporting, it was indeed an issue.

1reaction
gruselglatzcommented, Sep 14, 2020

@seut I will get it to you via my colleague @rene-stiams .

Read more comments on GitHub >

github_iconTop Results From Across the Web

Transport response handler not found of - Elastic Discuss
I have two server, both with logstash 2.4.0 and elasticsearch 2.4.0. I found the following warnings in the elasticsearch log on my master...
Read more >
Transport response handler not found of id - Opster
This guide will help you check for common problems that cause the log ” Transport response handler not found of id ” to...
Read more >
Transport response handler not found of id - Stack Overflow
I setuped a web app with org.springframework.boot:spring-boot-starter-data-elasticsearch . Everything work well - I can populate indexes ...
Read more >
ElasticSearch Server Randomly Stops Working
When I view the indexes, via "ls ...nodes/0/indeces/" it shows all indexes being modified today for some reason and there are new file...
Read more >
7 Using the Elasticsearch Handler - Oracle Help Center
7.4 Troubleshooting. This section contains information to help you troubleshoot various issues. Transport Client Properties File Not Found. This is applicable ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found