Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[DISCUSS] ClusterState feature design

See original GitHub issue

Accourding to #5564

Hi, community

The new module Control Panel which plans to consists of Orchestration, Metrics, OpenTracing has been created now. Control Panel will provide more powerful features for ShardingSphere in the future. So, we consider designing ClusterState module under Control Panel to enhance the orchestration feature, which can monitors the state and heartbeat of Proxy and Datasource in real time.

The following is the design document, any suggestions are welcome.

Current

Currently ShardingSphere only stores and displays Proxy state

Goals

Add heartbeats between Proxy and corresponding Datasources
Real-time monitoring of the connection between Proxy and corresponding Datasources
Generate real-time topology of Proxy and Datasources, node online / offline or abnormal can be displayed intuitively

Overall Design

Control Panel designs HeartBeat and ClusterState modules, responsible for heartbeat detection and node state processing
- Use scheduled tasks to notify the Proxy of heartbeat detection regularly
- Configurable whether heartbeat detection is enabled
- Configurable time interval of detection
Proxy receives heartbeat detection notification from Control Panel and creates heartbeat detection job
- Get a list of Datasources associated with current Proxy
- Get a connection, execute heartbeat detection SQL (configurable)
- Retry mechanism
  - Configurable whether to retry
  - Configurable maximum number of retries
Proxy gets Datasource heartbeat detection response
Proxy notifies Control Panel, update heartbeat to the Registry Center

Process

Start Registration

When the Proxy starts, initialize the Datasource. After the startup is complete, save the Proxy instance and the heartbeat with Datasources to the Registry Center

Timing Synchronization

The Control Panel starts scheduled tasks, and regularly notifies all Proxes to perform heartbeat detection
Proxy determines whether heartbeat detection is required based on Datasource’s last connection timestamp and state

Asynchronous Update

Once after the Proxy interacts with a Datasource, update the Datasource last connection timestamp to the Registry Center

Storage Structure

`` ` / ├─orchestration-namespace ├─orchestration-name │ ├─registry │ │ ├─instances │ │ │ ├─127.0.0.1@24668@xxx │ │ │ ├─127.0.0.1@24668@xxx

`` `

Instance node content

` instanceState: DISABLE datasources: sharding_db.ds_1: state: Node state lastConnect: Timestamp retryCount: RetryCount sharding_db.ds_2: state: Node state lastConnect: Timestamp retryCount: RetryCount `

The runtime Datasources are stored in the Proxy instance of the Registry Center
When heartbeat response successfully, updates the last connection timestamp and Datasource state to ONLINE
When heartbeat response failed, updates the Datasource state to INTERRUPT and waits for retry
New Datasources are added to the Registry Center automatically after detecting heartbeat

States

ONLINE
- Normal heartbeat response
INTERRUPT
- Heartbeat response failed and does not exceed the maximum number of retries
OFFLINE
- Heartbeat response failed and exceeded the maximum number of retries

Configurations

Heartbeat detection switch
- Enabled by default
- If the switch is closed, keep the current
Heartbeat detection SQL
- Default select 1;
Heartbeat detection interval
- Default setting 60s
Whether to retry
- Enabled by default
- If retry is turned off and the heartbeat response failed, the Datasource state is updated to OFFLINE
Maximum number of retries
- Effective when retry is enabled
- Default 3

Modules

/shardingsphere
├─control-panel
 |  ├─control-panel-cluster
 |  │  ├─control-panel-cluster-heartbeat
 |  │  ├─control-panle-cluster-state

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:6 (6 by maintainers)

Top GitHub Comments

2reactions

dongzlcommented, May 17, 2020

Hi @menghaoranss , I understand for the registry center instance, I think use proxy port for the data is more suitable.

/
├─orchestration-namespace
├─orchestration-name
│ ├─registry
│ │ ├─instances
│ │ │ ├─127.0.0.1@3307@xxx
│ │ │ │ ├─ds_0
│ │ │ │ ├─ds_1
│ │ │ │ ├─ds_2
│ │ │ │ ├─ds_3
│ │ │ ├─127.0.0.1@3308@xxx
│ │ │ │ ├─ds_0
│ │ │ │ ├─ds_1
│ │ │ │ ├─ds_2
│ │ │ │ ├─ds_3

127.0.0.1@3308@xxx, IP, Port and other data are more suitable.

0reactions

menghaoransscommented, May 20, 2020

Because the zookeeper EPHEMERAL node does not support child nodes，so，we will persist Datasource state to instance node：

instanceState: DISABLE
datasources:
    sharding_db.ds_1:
        state: Node state
        lastConnect: Timestamp
        retryCount: RetryCount
    sharding_db.ds_2:
        state: Node state
        lastConnect: Timestamp
        retryCount: RetryCount

Top Results From Across the Web

Architectural design for a topological cluster state quantum ...

In this paper we introduce a feasible architectural design for large scale quantum computation in optical systems. We combine the recent developments in ......

More Performant Cluster State Management Using Open ...

This talk will demonstrate a use case for open firmware in the context of HPC with the integration of Kraken, a distributed state...

Solved: Re: SolrException: ClusterState says we are the l...

SolrException: Cannot talk to ZooKeeper - Updates are disabled. at org.apache.solr.update.processor.DistributedZkUpdateProcessor.

Zookeeper Tutorial — With Practical Example | by Bikas Katwal

In the event of a change in the cluster state(leader goes down/any ... Before we start talking about the design and implementation of...

vSphere Datastore Cluster State Synchronisation - Shapeblue

Following production use of the new features and feedback from users, an opportunity for improvement was noticed. If a storage pool is added...