Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Disaggregated Coordinator - Multiple discovery servers do not seem to sync discovery information

See original GitHub issue

I have been trying out multi-coordinator setup - https://prestodb.io/blog/2022/04/15/disggregated-coordinator. This works fine with single instance of RM (Discovery server).

However while testing with multiple RMs (Discovery servers) it was seen that the discovery servers were not replicating states among each other.

Consider the following minimal setup

RM1

http-server.http.port=8080
discovery.uri=http://localhost:8080

RM2 (announces to discovery server in RM1)

http-server.http.port=8081
discovery.uri=http://localhost:8080

Coordinator1 (announces to discovery server in RM1)

discovery.uri=http://localhost:8080

Coordinator2 (announces to discovery server in RM2)

discovery.uri=http://localhost:8081

I added some logs in DiscoveryNodeManager#refreshNodesInternal() to periodically select all the services of type=discovery and type=presto -

@ServiceType("discovery") ServiceSelector discoveryServiceSelector
....
....
log.info("All known nodes to selector type: %s = %s", discoveryServiceSelector.getType(),
        discoveryServiceSelector.selectAllServices());
log.info("All known nodes to selector type: %s = %s", serviceSelector.getType(),
        serviceSelector.selectAllServices());

In RM1, RM2, coordinator1 we see following logs which suggest that they are able to see RM1, RM2, and coordinator1 as they all are querying the discovery server running in RM1.

2022-09-19T12:03:16.293+0530	INFO	node-state-poller-0	com.facebook.presto.metadata.DiscoveryNodeManager	All known nodes to selector type: discovery = [ServiceDescriptor{id=eab812ac-afa5-492d-8f74-6a7d8fbb2d8c, nodeId=presto-rm1, type=discovery, pool=general, location=/presto-rm1, state=null, properties={http=http://192.168.0.133:8080, http-external=http://192.168.0.133:8080}}, ServiceDescriptor{id=abec555a-2b37-4b52-a666-c6f705dbe7f0, nodeId=presto-rm2, type=discovery, pool=general, location=/presto-rm2, state=null, properties={http=http://192.168.0.133:8081, http-external=http://192.168.0.133:8081}}]
2022-09-19T12:03:16.293+0530	INFO	node-state-poller-0	com.facebook.presto.metadata.DiscoveryNodeManager	All known nodes to selector type: presto = [ServiceDescriptor{id=658764a6-c7cb-4dd9-bdec-c3d7338a6197, nodeId=presto-rm2, type=presto, pool=general, location=/presto-rm2, state=null, properties={node_version=0.276.1-8c89e4f, coordinator=false, resource_manager=true, catalog_server=false, http=http://192.168.0.133:8081, http-external=http://192.168.0.133:8081, connectorIds=system, thriftServerPort=63721}}, ServiceDescriptor{id=7b91fbcc-f742-4d2f-bee2-aedffe34172f, nodeId=presto-rm1, type=presto, pool=general, location=/presto-rm1, state=null, properties={node_version=0.276.1-8c89e4f, coordinator=false, resource_manager=true, catalog_server=false, http=http://192.168.0.133:8080, http-external=http://192.168.0.133:8080, connectorIds=system, thriftServerPort=63688}}, ServiceDescriptor{id=728e847d-5371-46c7-865f-941f40bf1121, nodeId=presto-c1, type=presto, pool=general, location=/presto-c1, state=RUNNING, properties={node_version=0.276.1-8c89e4f, coordinator=true, resource_manager=false, catalog_server=false, http=http://192.168.0.133:8090, http-external=http://192.168.0.133:8090, connectorIds=hive,tpcds,system,jmx, thriftServerPort=63832}}]

However in coordinator2, we see that it is only able to see itself and not others as it is the only one announcing to discovery server running in RM2.

2022-09-19T12:03:00.819+0530	INFO	node-state-poller-0	com.facebook.presto.metadata.DiscoveryNodeManager	All known nodes to selector type: discovery = []
2022-09-19T12:03:00.819+0530	INFO	node-state-poller-0	com.facebook.presto.metadata.DiscoveryNodeManager	All known nodes to selector type: presto = [ServiceDescriptor{id=eb96518f-c7e8-439b-8a64-89a9540503d0, nodeId=presto-c2, type=presto, pool=general, location=/presto-c2, state=RUNNING, properties={node_version=0.276.1-8c89e4f, coordinator=true, resource_manager=false, catalog_server=false, http=http://192.168.0.133:8091, http-external=http://192.168.0.133:8091, connectorIds=hive,tpcds,system,jmx, thriftServerPort=63977}}]

As a result, in coordinator2 we see an exception as well (this is just one issue, there will be others too)

2022-09-19T12:04:06.836+0530	ERROR	ResourceGroupManager	com.facebook.presto.execution.resourceGroups.InternalResourceGroupManager	Error while executing refreshAndStartQueries
com.facebook.drift.client.UncheckedTTransportException: No hosts available
	at com.facebook.drift.client.DriftInvocationHandler.invoke(DriftInvocationHandler.java:126)
	at com.sun.proxy.$Proxy131.getResourceGroupInfo(Unknown Source)
	at com.facebook.presto.resourcemanager.ResourceManagerResourceGroupService.getResourceGroupInfos(ResourceManagerResourceGroupService.java:85)
	at com.facebook.presto.resourcemanager.ResourceManagerResourceGroupService.lambda$new$0(ResourceManagerResourceGroupService.java:70)
	at com.facebook.presto.resourcemanager.ResourceManagerResourceGroupService.getResourceGroupInfo(ResourceManagerResourceGroupService.java:79)
	at com.facebook.presto.execution.resourceGroups.InternalResourceGroupManager.refreshResourceGroupRuntimeInfo(InternalResourceGroupManager.java:263)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: com.facebook.drift.protocol.TTransportException: No hosts available
	at com.facebook.drift.client.DriftMethodInvocation.fail(DriftMethodInvocation.java:331)
	at com.facebook.drift.client.DriftMethodInvocation.nextAttempt(DriftMethodInvocation.java:161)
	at com.facebook.drift.client.DriftMethodInvocation.createDriftMethodInvocation(DriftMethodInvocation.java:115)
	at com.facebook.drift.client.DriftMethodHandler.invoke(DriftMethodHandler.java:97)
	at com.facebook.drift.client.DriftInvocationHandler.invoke(DriftInvocationHandler.java:94)
	... 12 more
	Suppressed: com.facebook.drift.client.RetriesFailedException: No hosts available (invocationAttempts: 0, duration: 68.41us, failedConnections: 0, overloadedRejects: 0, attemptedAddresses: [])
		at com.facebook.drift.client.DriftMethodInvocation.fail(DriftMethodInvocation.java:337)
		... 16 more

Questions

These discovery servers are supposed to talk to each other and replicate the information, is this a correct understanding ?
Do we need to set some extra confs apart from the minimal ones mentioned here to have the replication working ? I was seeing a conf named service-inventory.uri being used in the replicator code in discover server code, does that play a role here, and what does it need to point to ?
Even if we set discovery.uri to a load balancer URL behind which multiple discovery servers can run, then also the deployment may not be consistent - the reason is that load balancer will get hit by the announcement calls from all the nodes, and it may get to forward calls(load balancing algo does not matter) from certain nodes to one discovery server and from other nodes to different discovery servers - this will be non deterministic, and might leave cluster in a state that some nodes are known to one discovery servers while the others are know to other ones.
Am I missing something here ?

Issue Analytics

State:
Created a year ago
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

swapsmagiccommented, Sep 26, 2022

Yes that is correct, discovery servers talk to each other to provide the updated node information that they receive.
Yes you are correct, service-inventory.uri is the missing config that is not documented as part of the minimal configuration (will go ahead and add that). The current implementation in the discovery server supports two options:

file based where you can point it to a static json file returning the discovery server information. Sample here.
url based, where it points to an endpoint returning the same information.

discovery.uri supposed to be pointing to the vip, so if one of the resource manager is down, another one is being picked up. And once you have the service-inventory.uri set, all resource managers in the cluster will have all node information.
Nothing else is missing.

Sorry for the late response and for the confusion. Let me know if you need any help with this.

0reactions

ShubhamChaurasiacommented, Sep 29, 2022

thanks for the response @swapsmagic, will try this out.

Top Results From Across the Web

Disaggregated Coordinator · - Presto

Discovery servers stay in sync by passing updates they receive to other discovery servers in the cluster.

Presto server - Cannot connect to discovery server for announce

Coordinator node starts, but can not announce itself to the Discovery service (running on the same node). Starting presto worker on another node ......

The Time is Ripe for Disaggregated Systems - SIGARCH

When a workload demands resources, easily accessible, hot-swappable compute or memory components are removed where they are underutilized and ...

Presto Workers unable to connect to Coordinator's Discovery UI

+ java -Dsun.security.krb5.debug=true -jar presto-cli-0.167-t.0.2-executable.jar --server https://master1:7778 --enable-authentication ...

HPE Nimble Storage dHCI and VMware vSphere New Servers ...

ESXi servers discovery . ... The dHCI catalogs provide information about the ... The HPE Nimble Storage dHCI solution does not support 1...