partition moves can severely cripple cluster activity and client acks
See original GitHub issueI know this is already on the radar from reading Gitter but I can’t find an issue for it, so creating one. And I know most of you will already be familiar, I’ll just write something thorough and clear for those that aren’t. If there’s work in progress it would be good to know too!
When allowing cruise control to trigger partition reassignments, even setting it to 1 move per broker using config, it can cripple cluster performance, and in our case severely increases the ack time for some acks=all producers, with the 95th percentile jumping to 15s or more.
This was addressed in 0.10.1.0 with throttling options intended to allow partition reassignments to be done with a lighter impact on a running cluster: https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+Replication+Quotas https://issues.apache.org/jira/browse/KAFKA-1464
The supplied kafka-reassign-partitions.sh script applies the throttles, and the code it calls is here: https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/ReassignPartitionsCommand.scala
And cruise control currently calls the ZKUtils directly to update a partition reassigmnent: https://github.com/linkedin/cruise-control/blob/master/cruise-control/src/main/scala/com/linkedin/kafka/cruisecontrol/executor/ExecutorUtils.scala#L95
zkUtils.updatePartitionReassignmentData(newReplicaAssignment)
which is this: https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/utils/ZkUtils.scala#L790
Issue Analytics
- State:
- Created 6 years ago
- Comments:11 (5 by maintainers)
Top GitHub Comments
@brettrann, thanks for creating the issue and providing an overview! Integration with replication quotas, as described in KIP-73, is definitely an important feature and is in our road map.
@efeg thanks for the feedback. Overall it sounds like what we’ve proposed is pretty much inline so we’ll start on an implementation and get you a PR soon (we may start this week, if not then definitely start next week). However, let me address your questions:
Yes, the throttle should be limited only to ongoing replica movements started by Cruise Control. Basically there is some code in
executeReplicaAssignmentTasks
where we’d apply the throttle before each move and then remove then throttle when done. So any other unexpected URPs will not be throttled by Cruise Control.My thoughts are that if a quota already exists we should leave it be (however I don’t feel strongly). I’m assuming that any pre-existing quota is there because a kafka cluster administrator set it or because a previous Cruise Control-executed rebalance was restarted/killed/failed before completing. In both these cases I feel it’s safest not to touch the existing quota.
My thoughts are that for this PR we should focus only on the user-initiated rebalances and that later we can add support for goal violation self-healing. Supporting goal violation self-healing is a little more complicated because the throttles need to be captured as part of the goal themselves.
I suggest we start with prior art and do the same as
rebalance-partitions.sh
(a global throttle that is applied to both leaders and followers) and then we can likely support both cases in a future PR (e.g. allowing athrottle
param,leader_throttle
param, andfollower_throttle
param with some logic to determine what the final param (or fail) if different combinations are present).When we complete a rebalance, the question around whether we remove the throttle or reset it back to the pre-existing throttle is a trade-off between the following:
Because Cruise Control does not keep any state it’s impossible to tell whether an existing throttle is present because a cluster administrator set the quota separate from Cruise Control, or if Cruise Control was restarted/failed/killed while rebalancing. I personally feel it’s important that Cruise Control cleans up after itself and does not leave dangling throttles. I am open to feedback on this and can change my mind.
Overall it does feel like we’re aligned and that we can probably sort out these questions in the PR! Thanks again for your feedback! We may start it this week, but will definitely start it next week!