Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

partition moves can severely cripple cluster activity and client acks

See original GitHub issue

I know this is already on the radar from reading Gitter but I can’t find an issue for it, so creating one. And I know most of you will already be familiar, I’ll just write something thorough and clear for those that aren’t. If there’s work in progress it would be good to know too!

When allowing cruise control to trigger partition reassignments, even setting it to 1 move per broker using config, it can cripple cluster performance, and in our case severely increases the ack time for some acks=all producers, with the 95th percentile jumping to 15s or more.

This was addressed in 0.10.1.0 with throttling options intended to allow partition reassignments to be done with a lighter impact on a running cluster: https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+Replication+Quotas https://issues.apache.org/jira/browse/KAFKA-1464

The supplied kafka-reassign-partitions.sh script applies the throttles, and the code it calls is here: https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/ReassignPartitionsCommand.scala

And cruise control currently calls the ZKUtils directly to update a partition reassigmnent: https://github.com/linkedin/cruise-control/blob/master/cruise-control/src/main/scala/com/linkedin/kafka/cruisecontrol/executor/ExecutorUtils.scala#L95

zkUtils.updatePartitionReassignmentData(newReplicaAssignment)

which is this: https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/utils/ZkUtils.scala#L790

Issue Analytics

State:
Created 6 years ago
Comments:11 (5 by maintainers)

Top GitHub Comments

4reactions

efegcommented, Mar 2, 2018

@brettrann, thanks for creating the issue and providing an overview! Integration with replication quotas, as described in KIP-73, is definitely an important feature and is in our road map.

1reaction

aaronlevin-stripecommented, May 14, 2019

@efeg thanks for the feedback. Overall it sounds like what we’ve proposed is pretty much inline so we’ll start on an implementation and get you a PR soon (we may start this week, if not then definitely start next week). However, let me address your questions:

Does the design ensure that the global throttle is limited to the ongoing replica movements started only by Cruise Control (CC)? In other words, as opposed to the expected URPs (i.e. being reassigned by CC), if there are unexpected URPs (i.e. fell out of ISR) while a reassignment throttle is in place, I assume such partitions will not be affected from the user-enforced rebalance quota. This will ensure that the rebalance quota will not have adverse affects on the speed at which the cluster recovers from unexpected URPs.

Yes, the throttle should be limited only to ongoing replica movements started by Cruise Control. Basically there is some code in executeReplicaAssignmentTasks where we’d apply the throttle before each move and then remove then throttle when done. So any other unexpected URPs will not be throttled by Cruise Control.

If the cluster has existing URPs when the user requests a rebalance (e.g. due to offline replicas), should we apply a different quota to the replicas of such partitions? This would help keeping the recovery time for such under replicated partitions at minimum.

My thoughts are that if a quota already exists we should leave it be (however I don’t feel strongly). I’m assuming that any pre-existing quota is there because a kafka cluster administrator set it or because a previous Cruise Control-executed rebalance was restarted/killed/failed before completing. In both these cases I feel it’s safest not to touch the existing quota.

Do we apply the same rebalance quota when the action is triggered by CC due to goal violation self-healing? For simplicity, we may choose to keep this patch limited to the user-initiated rebalance requests, and then later expand it to apply to (1) remove_broker (2) fix_offline_replicas (3) add_broker, and (4) self-healing actions initiated by CC, as well.

My thoughts are that for this PR we should focus only on the user-initiated rebalances and that later we can add support for goal violation self-healing. Supporting goal violation self-healing is a little more complicated because the throttles need to be captured as part of the goal themselves.

Add a throttle param to the rebalance endpoint -> Should we have separate parameters for leader and follower throttle rate? – i.e. leader_throttle and follower_throttle. This will be relevant if there are partitions with RF > 2, because in such cases, depending on the specific implementation, either leader_throttle will be violated or follower_throttle will not be honored. Alternatively, we may discuss enforcing the throttling only to the followers, but not the leaders.

I suggest we start with prior art and do the same as rebalance-partitions.sh (a global throttle that is applied to both leaders and followers) and then we can likely support both cases in a future PR (e.g. allowing a throttle param, leader_throttle param, and follower_throttle param with some logic to determine what the final param (or fail) if different combinations are present).

I am curious whether it would be possible to reset it back to the pre-existing throttle?

When we complete a rebalance, the question around whether we remove the throttle or reset it back to the pre-existing throttle is a trade-off between the following:

(reset throttle to pre-existing throttle): preserve user-specified throttles
(remove throttle when rebalance is done): ensure Cruise Control cleans up after itself and does not leave dangling throttles

Because Cruise Control does not keep any state it’s impossible to tell whether an existing throttle is present because a cluster administrator set the quota separate from Cruise Control, or if Cruise Control was restarted/failed/killed while rebalancing. I personally feel it’s important that Cruise Control cleans up after itself and does not leave dangling throttles. I am open to feedback on this and can change my mind.

Overall it does feel like we’re aligned and that we can probably sort out these questions in the PR! Thanks again for your feedback! We may start it this week, but will definitely start it next week!

Top Results From Across the Web

Kafka 3.3 Documentation

Servers: Kafka is run as a cluster of one or more servers that can span multiple ... a customer or vehicle ID) are...

Using AMQ Streams on RHEL Red Hat AMQ 2021.q3

Solid-state drives (SSDs), though not essential, can improve the performance of Kafka in large clusters where data is sent to and received ...

Agents - Configuration File Reference | Consul

Use agent configuration files to assign attributes to agents and configure multiple agents at once. Learn about agent configuration file parameters and ...

Elasticsearch Resiliency Status | Elastic

During a networking partition, cluster state updates (like mapping changes or shard assignments) are committed if a majority of the master-eligible nodes ...

5 Common Pitfalls When Using Apache Kafka - Confluent

Higher chance of partition unavailability when broker failover occurs. In an ideal, resilient cluster, every partition is replicated a number of ...