[META] Making all copies of shards spread evenly across all Awareness Attribute
See original GitHub issueIs your feature request related to a problem? Please describe.
In cloud HA deployments , customer usually deploy over multiple zones. zone is usually the awareness.attributes
in there . However, there is no enforcement of all copies spread evenly across all zones . This can cause uneven distribution of shards and also create shard hotspots. Failure in a single zone might also cause data loss and unavailability for that shard if the copies aren’t evenly spread out.
Describe the solution you’d like
There are two solutions to this approach :
- [Choosen Approach]A boolean cluster level setting
routing.allocation.awareness.balance
which is false by default . When true, we would validate that total copies is always a maximum of awareness attribute value count . If not, we will throw a validation exception. If there are multiple awareness attributes, the balance needs to ensure that every variant of awareness_attribute is equally balance. For ex, if there are 2 Awareness Attributes, zones and rack ids, each having 2 possible values , total copies needs to be multiple of 2. - A boolean cluster level setting
auto_balance_across_awareness_attribute
. If this is true, we would increase the total copies to be a multiple of AZ count . For instance, there are 3 AZs and index creation request comes with 7 replica. OpenSearch will create 8 replica, to ensure that there are total 9 copies .
Both the solutions will take in effect only upon cluster.routing.allocation.awareness.attributes
and cluster.routing.allocation.awareness.force.zone.values
being set . If not, the setting will not take in effect .
Trade offs
First approach : The plugins like ISM, CCR needs to do proactive validation while creation and updation of policy. If not, the actions/replication will fail silently at later point of time. As and when new policies or index creation paths are created , we will need to keep adding the validation there for a good experience.
Second approach : Since the replica count is adjusted by OpenSearch, the plugin and new index creation/modification paths don’t need any handling and is very low maintenance. However, the fact that we are deviating from API supplied parameter may not look like a good user experience.
User Experience
- User sets
cluster.routing.allocation.awareness.attributes
andcluster.routing.allocation.awareness.force.zone.values
- If user enables
routing.allocation.awareness.balance
, the total copy needs to be a maximum of all possible values of awareness attribute. If not , we will do one of the following
- Reject the create/update index
- Auto expand the replica count as per need.
Why it should be built
This is to ensure that OpenSearch cluster remains well balanced as well as resilient to failures of zone/Rack etc.
What will it take to execute?
Changes in OpenSearch as well Plugins to honor the new flag .
Issue Analytics
- State:
- Created a year ago
- Comments:12 (6 by maintainers)
Top GitHub Comments
@gbbafna can this issue be closed? I see https://github.com/opensearch-project/OpenSearch/issues/3461 which tracks the first solution here, with https://github.com/opensearch-project/OpenSearch/pull/3462 as the PR to
main
and https://github.com/opensearch-project/OpenSearch/pull/4086 as the backport to2.x
Thanks! I see it now. Can we also open an issue in the docs repo to track any documentation updates that might need to happen for this?