Insufficient number of racks to distribute each replica
See original GitHub issueHey @becketqin or @efeg , I used the POST to remove a broker and cruise control started replica movements as expected. It had ~430GB of data to move and it was a breeze until it got to 371GB and it basically stuck there. I let it run overnight just to find that its stuck still in the morning. So, I ran the stop execution POST and the status changed to “Stopping Execution”. Unfortunately, this status never changed for hours before I had to restart Cruise control and manually delete /admin/reassign_partitions from ZK to abort the process. Now, when I start Cruise Control or do a rebalance from it, I get this error:
[2018-06-13 12:31:55,478] ERROR Error processing POST request '/rebalance' due to: 'com.linkedin.kafka.cruisecontrol.exception.OptimizationFailureException: Insufficient number of racks to distribute each replica.'. (com.linkedin.kafka.cruisecontrol.servlet.KafkaCruiseControlServlet)
java.util.concurrent.ExecutionException: com.linkedin.kafka.cruisecontrol.exception.OptimizationFailureException: Insufficient number of racks to distribute each replica.
Could you please suggest what could be wrong? Maybe some fine tuning to do in the config?
Issue Analytics
- State:
- Created 5 years ago
- Comments:9
Top Results From Across the Web
kafka-cruise-control/Lobby - Gitter
CC reported an OptimizationFailureException for the RackAwareGoal, saying that Insufficient number of racks to distribute each replica (Current: 3, ...
Read more >Configure rack awareness - Banzai Cloud
Kafka's rack awareness feature spreads replicas of the same partition across different failure groups (racks or availability zones).
Read more >Optimizing Kafka cluster with Cruise Control - IBM Event ...
Ensures that all replicas of each partition are assigned in a rack-aware manner. ... Ensures that the maximum number of replicas per broker...
Read more >Chapter 8. Cruise Control for cluster rebalancing
Rack-awareness; Minimum number of leader replicas per broker for a set of ... For example, you might have a soft goal to distribute...
Read more >Rack aware Kafka: reassignment of partitions not including all ...
However, I am still unable to get the 'Proposed partition reassignement configuration', to span all 3 racks (one replica per rack).
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This is not a config issue, but rather a side effect of stopping the rebalance and deleting the partition reassignment (we ran into this several times). If you run
kafka-topics --zookeeper $ZK_CONNECT --describe
and look at the list of replicas for your topics, you’ll see some partitions that have more replicas than the replication factor they were created with. For example:kafka-topics --zookeeper $ZK_HOSTS --describe Topic:ExampleTopic PartitionCount:4 ReplicationFactor:2 Configs:retention.ms=2592000000,cleanup.policy=delete,compression.type=gzip Topic: ExampleTopic Partition: 0 Leader: 7 Replicas: 7,9,8,3 Isr: 7,9 Topic: ExampleTopic Partition: 1 Leader: 8 Replicas: 8,7 Isr: 8,7 Topic: ExampleTopic Partition: 2 Leader: 8 Replicas: 8,9 Isr: 8,9 Topic: ExampleTopic Partition: 3 Leader: 0 Replicas: 0,8 Isr: 0,8 Topic:ExampleTopic-2 PartitionCount:1 ReplicationFactor:2 Configs:retention.ms=2592000000,cleanup.policy=delete,compression.type=gzip Topic: ExampleTopic Partition: 0 Leader: 5 Replicas: 5,7 Isr: 5,7 Topic: ExampleTopic Partition: 1 Leader: 1 Replicas: 1 Isr: 1
ExampleTopic Partition 0 here is over replicated, as the partition movement was cancelled while it was trying to move the partitions from 7,9 to 8,3. This causes Cruise Control to think that it needs to replicate those partitions across different racks, which you don’t have enough of. The solution here would be to find all the partitions in your cluster that are over-replicated (or under-replicated, too!) and create a manual partition assignment json file and submit it for reassignment. In this example, ExampleTopic-2 also has Partition 1 under-replicated
cat partitions.json:
{ "partitions": [ { "topic": "ExampleTopic", "partition": 0, "replicas": [ 7, 9 ] }, { "topic": "ExampleTopic-2", "partition": 1, "replicas": [ 1, 3 ] } ], "version": 1 }
Submit this json file for reassignment, then verify until it’s complete: `kafka-reassign-partitions --zookeeper $ZK_CONNECT --reassignment-json-file partitions.json --execute
kafka-reassign-partitions --zookeeper $ZK_CONNECT --reassignment-json-file partitions.json --verify`
In the future, when a CC rebalance gets stuck, I have found it being caused by replica fetcher threads dying on brokers, which keeps the new partition assignment from ever joining the ISR set, eventually reaching the limit for concurrent moves that CC will attempt, effectively locking it up. What I did to work around it was to wait for CC to get stuck (no change in finished partition movements in a few hours), then look for the partitions that had an ISR smaller than their replica set, and restart the brokers that were missing from the ISR set, which got the brokers replicating again.
@efeg / @becketqin - it might be a useful feature to have a goal to fix mis-replicated partitions, as it’s a very manual process to generate the partitions json file. However, I’m not sure how it would be done in logic, as it seems like the ReplicationFactor is just calculated and not actually stored as a config anywhere. Additionally, some way to save a proposal to a file and execute it would be extremely useful. In our case, we had the dead replica fetcher threads on brokers cause 5 rebalances to fail, and it would have been much more efficient to be able to resume executing the old proposal instead of generating a new proposal and wasting replica moves (i.e. the proposal dies with 800/1000 moves complete, then the next proposal might have 500 partition movements, instead of just the 200 that were incomplete from the first proposal).
@ursdeepak2000
/load
(per broker) or\partition_load
(per partition) endpoints would be helpful.