Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] org.opensearch.cluster.coordination.CoordinationStateRejectedException upgrading 2.0.1 to 3.0

See original GitHub issue

Describe the bug Nodes cannot join back to the cluster after upgrading from 2.0.1 to 3.0.0.

To Reproduce Failing GHA: https://github.com/cliu123/security/runs/6926965970?check_suite_focus=true

Expected behavior A clear and concise description of what you expected to happen.

Plugins Please list all plugins currently enabled.

Error logs

 WARN ][o.o.c.NodeConnectionsService] [securityBwcCluster0-2] failed to connect to {securityBwcCluster0-1}{Us2JKk3aSU-lIJVCnUQ02w}{5Wnx4OM1RkSweVKL6lkfdA}{127.0.0.1}{127.0.0.1:43131}{dimr}{testattr=test, shard_indexing_pressure_enabled=true} (tried [1] times)
»  org.opensearch.transport.ConnectTransportException: [securityBwcCluster0-1][127.0.0.1:43131] connect_exception
»  	at org.opensearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1076) ~[opensearch-2.0.1.jar:2.0.1]
»  	at org.opensearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:215) ~[opensearch-2.0.1.jar:2.0.1]
»  	at org.opensearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:55) ~[opensearch-core-2.0.1.jar:2.0.1]
»  	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
»  	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[?:?]
»  	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
»  	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[?:?]
»  	at org.opensearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:70) ~[opensearch-core-2.0.1.jar:2.0.1]
»  	at org.opensearch.transport.netty4.Netty4TcpChannel.lambda$addListener$0(Netty4TcpChannel.java:81) ~[?:?]
»  	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578) ~[?:?]
»  	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571) ~[?:?]
»  	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550) ~[?:?]
»  	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491) ~[?:?]
»  	at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616) ~[?:?]
»  	at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609) ~[?:?]
»  	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) ~[?:?]
»  	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321) ~[?:?]
»  	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337) ~[?:?]
»  	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710) ~[?:?]
»  	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:623) ~[?:?]
»  	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:586) ~[?:?]
»  	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
»  	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[?:?]
»  	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
»  	at java.lang.Thread.run(Thread.java:833) [?:?]
»  Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: 127.0.0.1/127.0.0.1:43131
»  Caused by: java.net.ConnectException: Connection refused
»  	at sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
»  	at sun.nio.ch.Net.pollConnectNow(Net.java:672) ~[?:?]
»  	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946) ~[?:?]
»  	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330) ~[?:?]
»  	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) ~[?:?]
»  	... 7 more
»   ↓ last 40 non error or warning messages from /home/runner/work/security/security/bwc-test/build/testclusters/securityBwcCluster0-2/logs/opensearch.stdout.log ↓
» [2022-06-16T22:10:14,761][INFO ][o.o.n.Node               ] [securityBwcCluster0-2] node name [securityBwcCluster0-2], node ID [jjX5m3WkSLe8eAwhuQhn7A], cluster name [securityBwcCluster0], roles [cluster_manager, remote_cluster_client, data, ingest]
» [2022-06-16T22:10:19,058][INFO ][o.o.t.NettyAllocator     ] [securityBwcCluster0-2] creating NettyAllocator with the following configs: [name=unpooled, suggested_max_allocation_size=256kb, factors={opensearch.unsafe.use_unpooled_allocator=null, g1gc_enabled=true, g1gc_region_size=1mb, heap_size=512mb}]
» [2022-06-16T22:10:19,129][INFO ][o.o.d.DiscoveryModule    ] [securityBwcCluster0-2] using discovery type [zen] and seed hosts providers [settings, file]
» [2022-06-16T22:10:19,577][INFO ][o.o.n.Node               ] [securityBwcCluster0-2] initialized
» [2022-06-16T22:10:19,577][INFO ][o.o.n.Node               ] [securityBwcCluster0-2] starting ...
» [2022-06-16T22:10:19,727][INFO ][o.o.t.TransportService   ] [securityBwcCluster0-2] publish_address {127.0.0.1:42529}, bound_addresses {[::1]:37963}, {127.0.0.1:42529}
» [2022-06-16T22:10:19,932][DEBUG][o.o.c.c.Coordinator      ] [securityBwcCluster0-2] startInitialJoin: coordinator becoming CANDIDATE in term 0 (was null, lastKnownLeader was [Optional.empty])
» [2022-06-16T22:10:19,960][INFO ][o.o.h.AbstractHttpServerTransport] [securityBwcCluster0-2] publish_address {127.0.0.1:46681}, bound_addresses {[::1]:41935}, {127.0.0.1:46681}
» [2022-06-16T22:10:19,962][INFO ][o.o.n.Node               ] [securityBwcCluster0-2] started
» [2022-06-16T22:10:19,962][INFO ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-2] Node started
» [2022-06-16T22:10:19,968][INFO ][o.o.s.OpenSearchSecurityPlugin] [securityBwcCluster0-2] 0 OpenSearch Security modules loaded so far: []
» [2022-06-16T22:10:20,876][INFO ][o.o.c.c.Coordinator      ] [securityBwcCluster0-2] setting initial configuration to VotingConfiguration{jjX5m3WkSLe8eAwhuQhn7A,Us2JKk3aSU-lIJVCnUQ02w,{bootstrap-placeholder}-securityBwcCluster0-0}
» [2022-06-16T22:10:21,093][DEBUG][o.o.c.c.ElectionSchedulerFactory] [securityBwcCluster0-2] scheduling scheduleNextElection{gracePeriod=0s, thisAttempt=0, maxDelayMillis=100, delayMillis=22, ElectionScheduler{attempt=1, ElectionSchedulerFactory{initialTimeout=100ms, backoffTime=100ms, maxTimeout=10s}}}
» [2022-06-16T22:10:21,103][DEBUG][o.o.c.c.Coordinator      ] [securityBwcCluster0-2] joinLeaderInTerm: for [{securityBwcCluster0-1}{Us2JKk3aSU-lIJVCnUQ02w}{5Wnx4OM1RkSweVKL6lkfdA}{127.0.0.1}{127.0.0.1:43131}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}] with term 1
» [2022-06-16T22:10:21,104][DEBUG][o.o.c.c.CoordinationState] [securityBwcCluster0-2] handleStartJoin: leaving term [0] due to StartJoinRequest{term=1,node={securityBwcCluster0-1}{Us2JKk3aSU-lIJVCnUQ02w}{5Wnx4OM1RkSweVKL6lkfdA}{127.0.0.1}{127.0.0.1:43131}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}}
» [2022-06-16T22:10:21,129][DEBUG][o.o.c.c.ElectionSchedulerFactory] [securityBwcCluster0-2] scheduleNextElection{gracePeriod=0s, thisAttempt=0, maxDelayMillis=100, delayMillis=22, ElectionScheduler{attempt=1, ElectionSchedulerFactory{initialTimeout=100ms, backoffTime=100ms, maxTimeout=10s}}} starting election
» [2022-06-16T22:10:21,131][DEBUG][o.o.c.c.ElectionSchedulerFactory] [securityBwcCluster0-2] scheduling scheduleNextElection{gracePeriod=500ms, thisAttempt=1, maxDelayMillis=200, delayMillis=645, ElectionScheduler{attempt=2, ElectionSchedulerFactory{initialTimeout=100ms, backoffTime=100ms, maxTimeout=10s}}}
» [2022-06-16T22:10:21,134][DEBUG][o.o.c.c.PreVoteCollector ] [securityBwcCluster0-2] PreVotingRound{preVotesReceived={}, electionStarted=false, preVoteRequest=PreVoteRequest{sourceNode={securityBwcCluster0-2}{jjX5m3WkSLe8eAwhuQhn7A}{haqxPg4aSa23pHJFt9_XUA}{127.0.0.1}{127.0.0.1:42529}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, currentTerm=1}, isClosed=false} requesting pre-votes from [{securityBwcCluster0-2}{jjX5m3WkSLe8eAwhuQhn7A}{haqxPg4aSa23pHJFt9_XUA}{127.0.0.1}{127.0.0.1:42529}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, {securityBwcCluster0-1}{Us2JKk3aSU-lIJVCnUQ02w}{5Wnx4OM1RkSweVKL6lkfdA}{127.0.0.1}{127.0.0.1:43131}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, {securityBwcCluster0-0}{MntY5LcxSzCwoWnBav5SUA}{QagsblRtSNihbQhNLBZukQ}{127.0.0.1}{127.0.0.1:46477}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}]
» [2022-06-16T22:10:21,157][DEBUG][o.o.c.c.PreVoteCollector ] [securityBwcCluster0-2] PreVotingRound{preVotesReceived={{securityBwcCluster0-0}{MntY5LcxSzCwoWnBav5SUA}{QagsblRtSNihbQhNLBZukQ}{127.0.0.1}{127.0.0.1:46477}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}=PreVoteResponse{currentTerm=1, lastAcceptedTerm=0, lastAcceptedVersion=0}}, electionStarted=false, preVoteRequest=PreVoteRequest{sourceNode={securityBwcCluster0-2}{jjX5m3WkSLe8eAwhuQhn7A}{haqxPg4aSa23pHJFt9_XUA}{127.0.0.1}{127.0.0.1:42529}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, currentTerm=1}, isClosed=false} added PreVoteResponse{currentTerm=1, lastAcceptedTerm=0, lastAcceptedVersion=0} from {securityBwcCluster0-0}{MntY5LcxSzCwoWnBav5SUA}{QagsblRtSNihbQhNLBZukQ}{127.0.0.1}{127.0.0.1:46477}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, no quorum yet
» [2022-06-16T22:10:21,159][DEBUG][o.o.c.c.PreVoteCollector ] [securityBwcCluster0-2] PreVotingRound{preVotesReceived={{securityBwcCluster0-0}{MntY5LcxSzCwoWnBav5SUA}{QagsblRtSNihbQhNLBZukQ}{127.0.0.1}{127.0.0.1:46477}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}=PreVoteResponse{currentTerm=1, lastAcceptedTerm=0, lastAcceptedVersion=0}, {securityBwcCluster0-2}{jjX5m3WkSLe8eAwhuQhn7A}{haqxPg4aSa23pHJFt9_XUA}{127.0.0.1}{127.0.0.1:42529}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}=PreVoteResponse{currentTerm=1, lastAcceptedTerm=0, lastAcceptedVersion=0}}, electionStarted=false, preVoteRequest=PreVoteRequest{sourceNode={securityBwcCluster0-2}{jjX5m3WkSLe8eAwhuQhn7A}{haqxPg4aSa23pHJFt9_XUA}{127.0.0.1}{127.0.0.1:42529}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, currentTerm=1}, isClosed=false} added PreVoteResponse{currentTerm=1, lastAcceptedTerm=0, lastAcceptedVersion=0} from {securityBwcCluster0-2}{jjX5m3WkSLe8eAwhuQhn7A}{haqxPg4aSa23pHJFt9_XUA}{127.0.0.1}{127.0.0.1:42529}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, no quorum yet
» [2022-06-16T22:10:21,150][DEBUG][o.o.c.c.PreVoteCollector ] [securityBwcCluster0-2] TransportResponseHandler{PreVoteCollector{state=Tuple [v1=null, v2=PreVoteResponse{currentTerm=1, lastAcceptedTerm=0, lastAcceptedVersion=0}]}, node={securityBwcCluster0-1}{Us2JKk3aSU-lIJVCnUQ02w}{5Wnx4OM1RkSweVKL6lkfdA}{127.0.0.1}{127.0.0.1:43131}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}} failed
»  org.opensearch.transport.RemoteTransportException: [securityBwcCluster0-1][127.0.0.1:43131][internal:cluster/request_pre_vote]
»  Caused by: org.opensearch.cluster.coordination.CoordinationStateRejectedException: rejecting PreVoteRequest{sourceNode={securityBwcCluster0-2}{jjX5m3WkSLe8eAwhuQhn7A}{haqxPg4aSa23pHJFt9_XUA}{127.0.0.1}{127.0.0.1:42529}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}, currentTerm=1} as there is already a leader
»  	at org.opensearch.cluster.coordination.PreVoteCollector.handlePreVoteRequest(PreVoteCollector.java:162) ~[opensearch-2.0.1.jar:2.0.1]
»  	at org.opensearch.cluster.coordination.PreVoteCollector.lambda$new$0(PreVoteCollector.java:100) ~[opensearch-2.0.1.jar:2.0.1]
»  	at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:103) ~[opensearch-2.0.1.jar:2.0.1]
»  	at org.opensearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:453) ~[opensearch-2.0.1.jar:2.0.1]
»  	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:798) ~[opensearch-2.0.1.jar:2.0.1]
»  	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.0.1.jar:2.0.1]
»  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
»  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
»  	at java.lang.Thread.run(Thread.java:833) [?:?]

Issue Analytics

State:
Created a year ago
Comments:47 (46 by maintainers)

Top GitHub Comments

1reaction

cliu123commented, Jun 21, 2022

@cliu123 did you try running opensearch-min without security plugin? OpenSearch 3.0.0 is passing bwc tests with 2.0.1. Ref: https://github.com/opensearch-project/OpenSearch/blob/main/.ci/bwcVersions

That’s a good point! I haven’t tried that as I don’t see any security specific errors. But I’ll try.

0reactions

peterniedcommented, Sep 26, 2022

For the folks involved in this issue, while resolving this as by design makes sense, it still leaves our plugin stuck without a path to run CI with BWC tests enabled as we look to migrate to OpenSearch v3.0 I’ve filed https://github.com/opensearch-project/opensearch-plugins/issues/167 to track this question and what our recommendation is - I am going to disable these tests as they are a road block to progress, but I feel like this is going to become a surprise during our project lifecycle

Top Results From Across the Web

Upgrade from Elasticsearch OSS to OpenSearch

Elasticsearch OSS supports two types of upgrades: rolling and cluster restart. Rolling upgrades let you shut down one node at a time for...

Tutor: Upgrade to Nutmeg - Discovery plugin init finishes with ...

Initialization of Discovery on tutor local quickstart finishes with an error: Running init task: discovery/tasks/lms/init [snip] Creating ...

Arkime FAQ

Install Arkime or Arkime/Moloch Hybrid >= 3.0 without restarting ... This error almost always means that your OpenSearch/Elasticsearch cluster can not keep ...

Upgrading Amazon OpenSearch Service domains

x cluster. Check that your client code creates only a single mapping type per index. To minimize downtime during the upgrade from Elasticsearch...

Elasticsearch - Mapping conflict error upgrading from 1.5 to 2.0

When trying to start the master node of my cluster, just after the upgrade from 1.5 to 2.0 (yeah, that's a big jump),...