constant amount of underreplicated records
See original GitHub issueCrateDB version: 2.2.7 docker centos 7.4 12 Nodes docker 10G heap size crate-python 0.21.1 Problem description:
constant amount of underreplicated records that never decreases.
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | org.elasticsearch.env.ShardLockObtainFailedException: [.partitioned.events_schelling.04732d1p6cqjidho60o30c1g][2]: obtaining shard lock timed out after 5000ms
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:726) ~[crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:645) ~[crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:414) [crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:153) [crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:112) [crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:64) [crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:140) [crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:262) [crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:258) [crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) [crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1544) [crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
today I also found this in the log
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | [2018-02-09T11:12:33,542][INFO ][o.e.i.s.TransportNodesListShardStoreMetaData] [hephaistos1crate] [.partitioned.events_schelling.04732d1n6ssjae1k60o30c1g][2]: failed to obtain shard lock
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | org.elasticsearch.env.ShardLockObtainFailedException: [.partitioned.events_schelling.04732d1n6ssjae1k60o30c1g][2]: obtaining shard lock timed out after 5000ms
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:726) ~[crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:645) ~[crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:414) [crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:153) [crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:112) [crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:64) [crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:140) [crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:262) [crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:258) [crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) [crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1544) [crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [crate-app-2.2.7.jar:2.2.7]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
hephaistos-crate.0.05njw74j4tug@hephaistos1.localdomain | at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
Steps to reproduce: possible reason: upgraded from 2.2.4 to 2.2.7
Issue Analytics
- State:
- Created 6 years ago
- Comments:20 (10 by maintainers)
Top Results From Across the Web
Replication Dashboard | CockroachDB Docs
Under-replicated Ranges: When a cluster is first initialized, the few default starting ranges ... Under-replicated, The number of under-replicated ranges.
Read more >5 Common Pitfalls When Using Apache Kafka - Confluent
Any number of under-replicated partitions is a sign of an unhealthy cluster, as it implies that your data is not fully replicated as ......
Read more >A Mechanism for Controlled Breakage of Under-replicated ...
Timely and orderly rounds of DNA replication and segregation require oscillatory phosphorylation events carried out by cyclin-dependent kinases ...
Read more >Kafka - Unravel Data
A total number of under-replicated partitions. This metric indicates if the partitions ... A Topic is a category into which the Kafka records...
Read more >Kafka performance monitoring metrics | MetricFire Blog
A record has four attributes: key, value, timestamp, and titles. ... Under-replicated partitions metrics are a leading indicator of one or ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’m now able to reproduce that there are entries stuck in
sys.jobs
- I haven’t yet done enough testing to see if I can also get the shard lock errors, but it’s likely related.We’ll also take a closer look at the
jmap
issue or come up with an alternative recommendation on how to get heap dumps if CrateDB is used inside docker. But for now I don’t need a heap dump anymore.Thanks so far.
@mfussenegger just confirming that version 2.3.3 fixed the the problem. This issue can be closed.