question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add a sensor to identify a cluster containing partitions with replication factor > number of racks

See original GitHub issue

To enhance the fault tolerance of clusters with insufficient number of racks to provide a full rack-awareness (i.e. each replica of each partition resides on a different rack), we added a new hard goal, called RackAwareDistributionGoal (see https://github.com/linkedin/cruise-control/pull/1345). This goal is a relaxed version of RackAwareGoal, and evenly distributes replicas over racks. Contrary to RackAwareGoal, if replicas of each partition can be evenly distributed across the racks, this goal lets placement of multiple replicas of a partition into a single rack.

To help identification of clusters containing partitions with RF > number of racks, add a sensor that monitoring systems can alert on if needed.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
mgrubentcommented, Oct 27, 2020

For our own purposes, I think the boolean datatype (or its 0/1 integer representation) is sufficient.

I would prefer not to have to de-noise this sensor by comparison to whether there is an ongoing partition reassignment.

The ideal monitoring-ready sensor for me would be:

  • 0: No static partitions have RF > the number of racks in the kafka cluster
  • 1: At least one static partition has RF > the number of racks in the kafka cluster

Where a “static” partition is one for which there is not an ongoing partition reassignment.
Ideally, this would include partition reassignments from non-cruise-control sources as well, but that may be infeasible to provide in kafka 2.4+

0reactions
efegcommented, Oct 27, 2020

@mgrubent Other than the special value (see https://github.com/linkedin/cruise-control/issues/1365#issuecomment-717438456), it would probably be similar to the sensor that signals whether the cluster has unfixable goals – i.e. 0: no unfixable goal, 1: has unfixable goal.

I would be happy to discuss suggestions that intend to provide a universal solution applicable to all downstream monitoring / alerting frameworks.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Kafka Topics Choosing the Replication Factor and Partitions ...
Diagram showing how Kafka topics replicate messages across brokers based on the configured replication factor. Adding replicas uses more space and adds more ......
Read more >
Post Kafka Deployment | Confluent Documentation
A partition will span the number of different racks, which is a minimum of #racks and replication-factor. The algorithm used to assign replicas...
Read more >
Kafka 3.3 Documentation
A common production setting is a replication factor of 3, i.e., there will always be three copies of your data. This replication is...
Read more >
Clustering and Network Partitions - RabbitMQ
Clustering can be used to achieve different goals: increased data safety through replication, increased availability for client operations, higher overall ...
Read more >
Apache Kafka Guide - Cloudera Documentation
A copy of the Apache License Version 2.0, including any notices, ... In this case, its replication factor and partition count is derived ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found