A host can't connect and won't try other hosts again in multi-hosts service url configuration
See original GitHub issueDescribe the bug
Use “pulsar://host1:6650,host2:6650,127.0.0.1:6650” as service url to create a pulsar client, “host1” and “host2” is the wrong hosts, not every time I can successfully connect to “127.0.0.1:6650”.
To Reproduce
public void testRetryWithMultiHostServiceUrl() throws PulsarClientException {
String serviceUrl = "pulsar://host1:6650,host2:6650,127.0.0.1:" + BROKER_PORT;
PulsarClient client = PulsarClient.builder().serviceUrl(serviceUrl).build();
Producer producer = client.newProducer().topic("persistent://my-property/my-ns/multi-host").create();
assertNotNull(producer);
}
Sometimes will get “java.net.UnknownHostException” and sometimes will connect success. Here is the stack trace while create producer failed:
org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.CompletionException: java.net.UnknownHostException: failed to resolve 'host1' after 6 queries
at org.apache.pulsar.client.impl.ConnectionPool.lambda$null$9(ConnectionPool.java:202)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute$$$capture(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:474)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: java.net.UnknownHostException: failed to resolve 'host1' after 6 queries
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.apache.pulsar.client.impl.ConnectionPool.lambda$resolveName$16(ConnectionPool.java:259)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:485)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
at io.netty.resolver.dns.DnsResolveContext.finishResolve(DnsResolveContext.java:856)
at io.netty.resolver.dns.DnsResolveContext.tryToFinishResolve(DnsResolveContext.java:809)
at io.netty.resolver.dns.DnsResolveContext.query(DnsResolveContext.java:332)
at io.netty.resolver.dns.DnsResolveContext.onResponse(DnsResolveContext.java:495)
at io.netty.resolver.dns.DnsResolveContext.access$400(DnsResolveContext.java:62)
at io.netty.resolver.dns.DnsResolveContext$3.operationComplete(DnsResolveContext.java:376)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:103)
at io.netty.resolver.dns.DnsQueryContext.setSuccess(DnsQueryContext.java:196)
at io.netty.resolver.dns.DnsQueryContext.finish(DnsQueryContext.java:188)
at io.netty.resolver.dns.DnsNameResolver$DnsResponseHandler.channelRead(DnsNameResolver.java:1149)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
at io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:93)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:591)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:508)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470)
... 3 more
Caused by: java.net.UnknownHostException: failed to resolve 'host1' after 6 queries
at io.netty.resolver.dns.DnsResolveContext.finishResolve(DnsResolveContext.java:848)
... 32 more
Expected behavior
Pulsar client should try to connect to each hosts, only all the hosts can’t connect successfully return the error message to users.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Troubleshoot configuration server issues - Azure
This article provides troubleshooting information for deploying the configuration server for disaster recovery of VMware VMs and physical ...
Read more >How to fix 8 common remote desktop connection problems
First, try to establish a session from a client that has been able to successfully connect in the past. The goal is to...
Read more >Networking with overlay networks - Docker Documentation
This tutorial requires three physical or virtual Docker hosts which can all communicate with one another. This tutorial assumes that the three hosts...
Read more >Multi User Mode Issues - QuickBooks - Intuit
The following are some reasons for QuickBooks multi-user mode not working: QuickBooks is not installed on the server. Possibly that the hosting ...
Read more >How To Set Up Apache Virtual Hosts on Ubuntu 18.04
This tutorial will guide you through setting up multiple domains and websites using Apache virtual hosts on an Ubuntu 18.04 server.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I would like to contribute to this issue.
Here is a possible fix https://github.com/apache/pulsar/pull/18838.
Since our tests depend heavily on the unresolved logic, I write a best effort solution so that prioritize the reachable hosts in candidates.