Add a sensor to identify a cluster containing partitions with replication factor > number of racks
See original GitHub issueTo enhance the fault tolerance of clusters with insufficient number of racks to provide a full rack-awareness (i.e. each replica of each partition resides on a different rack), we added a new hard goal, called RackAwareDistributionGoal
(see https://github.com/linkedin/cruise-control/pull/1345). This goal is a relaxed version of RackAwareGoal
, and evenly distributes replicas over racks. Contrary to RackAwareGoal
, if replicas of each partition can be evenly distributed across the racks, this goal lets placement of multiple replicas of a partition into a single rack.
To help identification of clusters containing partitions with RF > number of racks, add a sensor that monitoring systems can alert on if needed.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Kafka Topics Choosing the Replication Factor and Partitions ...
Diagram showing how Kafka topics replicate messages across brokers based on the configured replication factor. Adding replicas uses more space and adds more ......
Read more >Post Kafka Deployment | Confluent Documentation
A partition will span the number of different racks, which is a minimum of #racks and replication-factor. The algorithm used to assign replicas...
Read more >Kafka 3.3 Documentation
A common production setting is a replication factor of 3, i.e., there will always be three copies of your data. This replication is...
Read more >Clustering and Network Partitions - RabbitMQ
Clustering can be used to achieve different goals: increased data safety through replication, increased availability for client operations, higher overall ...
Read more >Apache Kafka Guide - Cloudera Documentation
A copy of the Apache License Version 2.0, including any notices, ... In this case, its replication factor and partition count is derived ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
For our own purposes, I think the boolean datatype (or its
0/1
integer representation) is sufficient.I would prefer not to have to de-noise this sensor by comparison to whether there is an ongoing partition reassignment.
The ideal monitoring-ready sensor for me would be:
0
: No static partitions have RF > the number of racks in the kafka cluster1
: At least one static partition has RF > the number of racks in the kafka clusterWhere a “static” partition is one for which there is not an ongoing partition reassignment.
Ideally, this would include partition reassignments from non-
cruise-control
sources as well, but that may be infeasible to provide in kafka 2.4+@mgrubent Other than the special value (see https://github.com/linkedin/cruise-control/issues/1365#issuecomment-717438456), it would probably be similar to the sensor that signals whether the cluster has unfixable goals – i.e.
0
: no unfixable goal,1
: has unfixable goal.I would be happy to discuss suggestions that intend to provide a universal solution applicable to all downstream monitoring / alerting frameworks.