Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cruise control not auto rebalancing after first rebalance

See original GitHub issue

Cruise Control version: v0.1.10

I am testing scaling up Kafka clusters running on kubernetes using cruise control. As part of scaling up, I am creating new Kafka broker pods and then relying on cruise-control to trigger a cluster rebalance to assign partitions to the newly added brokers as per this suggestion by @efeg: https://github.com/linkedin/cruise-control/issues/344#issuecomment-426094872.

I also found a similar issue (https://github.com/linkedin/cruise-control/issues/906) to this one where the ReplicaDistributionGoal was missing in the anomaly.detection.goals. Based on this, I have added the ReplicaDistributionGoal to my anomaly.detection.goals config and I am relying on cluster rebalancing to be triggered because of this goal failing when new brokers are added.

However, I noticed that cruise control only triggered a cluster rebalance and assigned partitions to the newly created brokers ONCE. In order to be sure that auto-rebalancing is working, I scaled down the cluster and then added brokers back to the cluster. But this time cruise control did not trigger a cluster rebalance, partitions were not assigned to the new brokers and the new brokers remained idle until I manually triggered a cluster rebalance.

I am attaching some screenshots from the cc-ui for reference.

Brokers 4 and 5 were the new brokers that were launched. It seems like only newly created topics were assigned to these brokers and the existing data was not rebalanced and existing partitions were not assigned to these brokers. Screen Shot 2020-07-16 at 8 12 40 AM

Cruise Control Monitor State: Screen Shot 2020-07-16 at 8 12 53 AM

Cruise Control Analyzer State: Note that the ReplicaDistributionGoal seems to be ready to be analyzed. Screen Shot 2020-07-16 at 8 13 01 AM

Anomaly Detector State: Notice that the ReplicaDistributionGoal violation was detected only ONCE - this was when I added a broker earlier. But when I added more again, this was never detected and self-healing did not get triggered. Screen Shot 2020-07-16 at 8 13 32 AM

Cruise Control Proposals: It seems like cruise control thinks that it has already fixed the ReplicaDistributionGoal violation? However, as the first screenshot shows, self-healing did not get triggered and the cluster did not auto-rebalance. The cruise control logs also did not show any log line for triggering self-healing. Screen Shot 2020-07-16 at 8 13 48 AM

Configurations

These are the relevant configurations that I have setup cruise control with: self.healing.enabled=true

default.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal

goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.PotentialNwOutGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.TopicReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderBytesInDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.kafkaassigner.KafkaAssignerDiskUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.kafkaassigner.KafkaAssignerEvenRackAwareGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.PreferredLeaderElectionGoal

hard.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal

anomaly.detection.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal

Issue Analytics

State:
Created 3 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

mhaseebmlkcommented, Jul 16, 2020

@efeg that seems to have fixed the problem. Thanks for your help!

1reaction

efegcommented, Jul 16, 2020

@mhaseebmlk To add on https://github.com/linkedin/cruise-control/issues/1270#issuecomment-659609862, note that for REST API requests, you can explicitly set exclude_recently_removed_brokers=false to avoid excluding recently removed brokers. The current default value (i.e. value used if missing) for this parameter is true (see code).

Top Results From Across the Web

Rebalancing with Cruise Control | CDP Private Cloud

The rebalancing process of Cruise Control works in the background. You do not need to adjust or set anything during the process.

Chapter 8. Cruise Control for cluster rebalancing

To rebalance a Kafka cluster, Cruise Control uses optimization goals to generate optimization proposals, which you can approve or reject. Optimization goals are ......

Cluster balancing with Cruise Control - Strimzi

Strimzi Cruise Control support As well as rebalancing a whole cluster, Cruise Control also has the ability to do this balancing automatically ......

kafka-cruise-control/Lobby - Gitter

I'm trying to increase the replication factor on other partitions that are ... It's asking me to rebalance first, but I'm getting a...

System and method for rebalancing a battery during vehicle ...

The method may include determining when an automatic rebalance mode start ... the cruise control actuator 106 may include a speed control controller...