[BUG] org.opensearch.cluster.coordination.CoordinationStateRejectedException upgrading 2.0.1 to 3.0
See original GitHub issueDescribe the bug Nodes cannot join back to the cluster after upgrading from 2.0.1 to 3.0.0.
To Reproduce Failing GHA: https://github.com/cliu123/security/runs/6926965970?check_suite_focus=true
Expected behavior A clear and concise description of what you expected to happen.
Plugins Please list all plugins currently enabled.
Error logs
WARN ][o.o.c.NodeConnectionsService] [securityBwcCluster0-2] failed to connect to {securityBwcCluster0-1}{Us2JKk3aSU-lIJVCnUQ02w}{5Wnx4OM1RkSweVKL6lkfdA}{127.0.0.1}{127.0.0.1:43131}{dimr}{testattr=test, shard_indexing_pressure_enabled=true} (tried [1] times)
» org.opensearch.transport.ConnectTransportException: [securityBwcCluster0-1][127.0.0.1:43131] connect_exception
» at org.opensearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1076) ~[opensearch-2.0.1.jar:2.0.1]
» at org.opensearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:215) ~[opensearch-2.0.1.jar:2.0.1]
» at org.opensearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:55) ~[opensearch-core-2.0.1.jar:2.0.1]
» at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
» at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[?:?]
» at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
» at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[?:?]
» at org.opensearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:70) ~[opensearch-core-2.0.1.jar:2.0.1]
» at org.opensearch.transport.netty4.Netty4TcpChannel.lambda$addListener$0(Netty4TcpChannel.java:81) ~[?:?]
» at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578) ~[?:?]
» at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571) ~[?:?]
» at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550) ~[?:?]
» at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491) ~[?:?]
» at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616) ~[?:?]
» at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609) ~[?:?]
» at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) ~[?:?]
» at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321) ~[?:?]
» at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337) ~[?:?]
» at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710) ~[?:?]
» at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:623) ~[?:?]
» at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:586) ~[?:?]
» at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
» at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[?:?]
» at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
» at java.lang.Thread.run(Thread.java:833) [?:?]
» Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: 127.0.0.1/127.0.0.1:43131
» Caused by: java.net.ConnectException: Connection refused
» at sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
» at sun.nio.ch.Net.pollConnectNow(Net.java:672) ~[?:?]
» at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946) ~[?:?]
» at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330) ~[?:?]
» at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) ~[?:?]
» ... 7 more
» ↓ last 40 non error or warning messages from /home/runner/work/security/security/bwc-test/build/testclusters/securityBwcCluster0-2/logs/opensearch.stdout.log ↓
» [2022-06-16T22:10:14,761][INFO ][o.o.n.Node ] [securityBwcCluster0-2] node name [securityBwcCluster0-2], node ID [jjX5m3WkSLe8eAwhuQhn7A], cluster name [securityBwcCluster0], roles [cluster_manager, remote_cluster_client, data, ingest]
» [2022-06-16T22:10:19,058][INFO ][o.o.t.NettyAllocator ] [securityBwcCluster0-2] creating NettyAllocator with the following configs: [name=unpooled, suggested_max_allocation_size=256kb, factors={opensearch.unsafe.use_unpooled_allocator=null, g1gc_enabled=true, g1gc_region_size=1mb, heap_size=512mb}]
» [2022-06-16T22:10:19,129][INFO ][o.o.d.DiscoveryModule ] [securityBwcCluster0-2] using discovery type [zen] and seed hosts providers [settings, file]
» [2022-06-16T22:10:19,577][INFO ][o.o.n.Node ] [securityBwcCluster0-2] initialized
» [2022-06-16T22:10:19,577][INFO ][o.o.n.Node ] [securityBwcCluster0-2] starting ...
» [2022-06-16T22:10:19,727][INFO ][o.o.t.TransportService ] [securityBwcCluster0-2] publish_address {127.0.0.1:42529}, bound_addresses {[::1]:37963}, {127.0.0.1:42529}
» [2022-06-16T22:10:19,932][DEBUG][o.o.c.c.Coordinator ] [securityBwcCluster0-2] startInitialJoin: coordinator becoming CANDIDATE in term 0 (was null, lastKnownLeader was [Optional.empty])
» [2022-06-16T22:10:19,960][INFO ][o.o.h.AbstractHttpServerTransport] [securityBwcCluster0-2] publish_address {127.0.0.1:46681}, bound_addresses {[::1]:41935}, {127.0.0.1:46681}
» [2022-06-16T22:10:19,962][INFO ][o.o.n.Node ] [securityBwcCluster0-2] started
» [2022-06-16T22:10:19,962][INFO ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-2] Node started
» [2022-06-16T22:10:19,968][INFO ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-2] 0 OpenSearch Security modules loaded so far: []
» [2022-06-16T22:10:20,876][INFO ][o.o.c.c.Coordinator ] [securityBwcCluster0-2] setting initial configuration to VotingConfiguration{jjX5m3WkSLe8eAwhuQhn7A,Us2JKk3aSU-lIJVCnUQ02w,{bootstrap-placeholder}-securityBwcCluster0-0}
» [2022-06-16T22:10:21,093][DEBUG][o.o.c.c.ElectionSchedulerFactory] [securityBwcCluster0-2] scheduling scheduleNextElection{gracePeriod=0s, thisAttempt=0, maxDelayMillis=100, delayMillis=22, ElectionScheduler{attempt=1, ElectionSchedulerFactory{initialTimeout=100ms, backoffTime=100ms, maxTimeout=10s}}}
» [2022-06-16T22:10:21,103][DEBUG][o.o.c.c.Coordinator ] [securityBwcCluster0-2] joinLeaderInTerm: for [{securityBwcCluster0-1}{Us2JKk3aSU-lIJVCnUQ02w}{5Wnx4OM1RkSweVKL6lkfdA}{127.0.0.1}{127.0.0.1:43131}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}] with term 1
» [2022-06-16T22:10:21,104][DEBUG][o.o.c.c.CoordinationState] [securityBwcCluster0-2] handleStartJoin: leaving term [0] due to StartJoinRequest{term=1,node={securityBwcCluster0-1}{Us2JKk3aSU-lIJVCnUQ02w}{5Wnx4OM1RkSweVKL6lkfdA}{127.0.0.1}{127.0.0.1:43131}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}}
» [2022-06-16T22:10:21,129][DEBUG][o.o.c.c.ElectionSchedulerFactory] [securityBwcCluster0-2] scheduleNextElection{gracePeriod=0s, thisAttempt=0, maxDelayMillis=100, delayMillis=22, ElectionScheduler{attempt=1, ElectionSchedulerFactory{initialTimeout=100ms, backoffTime=100ms, maxTimeout=10s}}} starting election
» [2022-06-16T22:10:21,131][DEBUG][o.o.c.c.ElectionSchedulerFactory] [securityBwcCluster0-2] scheduling scheduleNextElection{gracePeriod=500ms, thisAttempt=1, maxDelayMillis=200, delayMillis=645, ElectionScheduler{attempt=2, ElectionSchedulerFactory{initialTimeout=100ms, backoffTime=100ms, maxTimeout=10s}}}
» [2022-06-16T22:10:21,134][DEBUG][o.o.c.c.PreVoteCollector ] [securityBwcCluster0-2] PreVotingRound{preVotesReceived={}, electionStarted=false, preVoteRequest=PreVoteRequest{sourceNode={securityBwcCluster0-2}{jjX5m3WkSLe8eAwhuQhn7A}{haqxPg4aSa23pHJFt9_XUA}{127.0.0.1}{127.0.0.1:42529}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, currentTerm=1}, isClosed=false} requesting pre-votes from [{securityBwcCluster0-2}{jjX5m3WkSLe8eAwhuQhn7A}{haqxPg4aSa23pHJFt9_XUA}{127.0.0.1}{127.0.0.1:42529}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, {securityBwcCluster0-1}{Us2JKk3aSU-lIJVCnUQ02w}{5Wnx4OM1RkSweVKL6lkfdA}{127.0.0.1}{127.0.0.1:43131}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, {securityBwcCluster0-0}{MntY5LcxSzCwoWnBav5SUA}{QagsblRtSNihbQhNLBZukQ}{127.0.0.1}{127.0.0.1:46477}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}]
» [2022-06-16T22:10:21,157][DEBUG][o.o.c.c.PreVoteCollector ] [securityBwcCluster0-2] PreVotingRound{preVotesReceived={{securityBwcCluster0-0}{MntY5LcxSzCwoWnBav5SUA}{QagsblRtSNihbQhNLBZukQ}{127.0.0.1}{127.0.0.1:46477}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}=PreVoteResponse{currentTerm=1, lastAcceptedTerm=0, lastAcceptedVersion=0}}, electionStarted=false, preVoteRequest=PreVoteRequest{sourceNode={securityBwcCluster0-2}{jjX5m3WkSLe8eAwhuQhn7A}{haqxPg4aSa23pHJFt9_XUA}{127.0.0.1}{127.0.0.1:42529}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, currentTerm=1}, isClosed=false} added PreVoteResponse{currentTerm=1, lastAcceptedTerm=0, lastAcceptedVersion=0} from {securityBwcCluster0-0}{MntY5LcxSzCwoWnBav5SUA}{QagsblRtSNihbQhNLBZukQ}{127.0.0.1}{127.0.0.1:46477}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, no quorum yet
» [2022-06-16T22:10:21,159][DEBUG][o.o.c.c.PreVoteCollector ] [securityBwcCluster0-2] PreVotingRound{preVotesReceived={{securityBwcCluster0-0}{MntY5LcxSzCwoWnBav5SUA}{QagsblRtSNihbQhNLBZukQ}{127.0.0.1}{127.0.0.1:46477}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}=PreVoteResponse{currentTerm=1, lastAcceptedTerm=0, lastAcceptedVersion=0}, {securityBwcCluster0-2}{jjX5m3WkSLe8eAwhuQhn7A}{haqxPg4aSa23pHJFt9_XUA}{127.0.0.1}{127.0.0.1:42529}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}=PreVoteResponse{currentTerm=1, lastAcceptedTerm=0, lastAcceptedVersion=0}}, electionStarted=false, preVoteRequest=PreVoteRequest{sourceNode={securityBwcCluster0-2}{jjX5m3WkSLe8eAwhuQhn7A}{haqxPg4aSa23pHJFt9_XUA}{127.0.0.1}{127.0.0.1:42529}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, currentTerm=1}, isClosed=false} added PreVoteResponse{currentTerm=1, lastAcceptedTerm=0, lastAcceptedVersion=0} from {securityBwcCluster0-2}{jjX5m3WkSLe8eAwhuQhn7A}{haqxPg4aSa23pHJFt9_XUA}{127.0.0.1}{127.0.0.1:42529}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, no quorum yet
» [2022-06-16T22:10:21,150][DEBUG][o.o.c.c.PreVoteCollector ] [securityBwcCluster0-2] TransportResponseHandler{PreVoteCollector{state=Tuple [v1=null, v2=PreVoteResponse{currentTerm=1, lastAcceptedTerm=0, lastAcceptedVersion=0}]}, node={securityBwcCluster0-1}{Us2JKk3aSU-lIJVCnUQ02w}{5Wnx4OM1RkSweVKL6lkfdA}{127.0.0.1}{127.0.0.1:43131}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}} failed
» org.opensearch.transport.RemoteTransportException: [securityBwcCluster0-1][127.0.0.1:43131][internal:cluster/request_pre_vote]
» Caused by: org.opensearch.cluster.coordination.CoordinationStateRejectedException: rejecting PreVoteRequest{sourceNode={securityBwcCluster0-2}{jjX5m3WkSLe8eAwhuQhn7A}{haqxPg4aSa23pHJFt9_XUA}{127.0.0.1}{127.0.0.1:42529}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, currentTerm=1} as there is already a leader
» at org.opensearch.cluster.coordination.PreVoteCollector.handlePreVoteRequest(PreVoteCollector.java:162) ~[opensearch-2.0.1.jar:2.0.1]
» at org.opensearch.cluster.coordination.PreVoteCollector.lambda$new$0(PreVoteCollector.java:100) ~[opensearch-2.0.1.jar:2.0.1]
» at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:103) ~[opensearch-2.0.1.jar:2.0.1]
» at org.opensearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:453) ~[opensearch-2.0.1.jar:2.0.1]
» at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:798) ~[opensearch-2.0.1.jar:2.0.1]
» at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.0.1.jar:2.0.1]
» at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
» at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
» at java.lang.Thread.run(Thread.java:833) [?:?]
Issue Analytics
- State:
- Created a year ago
- Comments:47 (46 by maintainers)
Top Results From Across the Web
Upgrade from Elasticsearch OSS to OpenSearch
Elasticsearch OSS supports two types of upgrades: rolling and cluster restart. Rolling upgrades let you shut down one node at a time for...
Read more >Tutor: Upgrade to Nutmeg - Discovery plugin init finishes with ...
Initialization of Discovery on tutor local quickstart finishes with an error: Running init task: discovery/tasks/lms/init [snip] Creating ...
Read more >Arkime FAQ
Install Arkime or Arkime/Moloch Hybrid >= 3.0 without restarting ... This error almost always means that your OpenSearch/Elasticsearch cluster can not keep ...
Read more >Upgrading Amazon OpenSearch Service domains
x cluster. Check that your client code creates only a single mapping type per index. To minimize downtime during the upgrade from Elasticsearch...
Read more >Elasticsearch - Mapping conflict error upgrading from 1.5 to 2.0
When trying to start the master node of my cluster, just after the upgrade from 1.5 to 2.0 (yeah, that's a big jump),...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
That’s a good point! I haven’t tried that as I don’t see any security specific errors. But I’ll try.
For the folks involved in this issue, while resolving this as by design makes sense, it still leaves our plugin stuck without a path to run CI with BWC tests enabled as we look to migrate to OpenSearch v3.0 I’ve filed https://github.com/opensearch-project/opensearch-plugins/issues/167 to track this question and what our recommendation is - I am going to disable these tests as they are a road block to progress, but I feel like this is going to become a surprise during our project lifecycle