question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Producer registe failed when reconnecting

See original GitHub issue

Describe the bug Producer registe failed and retry still failed when reconnecting after keep-alive timeout. Logs as follows:

2021-12-14 15:11:07,284 [ INFO ] ProducerImpl - [persistent://public/default/level2-pressure-1-partition-48] [level2-54-188] Created producer on cnx [id: 0x92ee3683, L:/aa.aa.aa.aa:34398
- R:server.docker.ys/bb.bb.bb.bb:6650]
2021-12-14 15:54:45,473 [ WARN ] PulsarHandler - [[id: 0x92ee3683, L:/aa.aa.aa.aa:34398 - R:server.docker.ys/bb.bb.bb.bb:6650]] Forcing connection to close after keep-alive timeout
2021-12-14 15:54:45,507 [ INFO ] ClientCnx - [id: 0x92ee3683, L:/aa.aa.aa.aa:34398 ! R:server.docker.ys/bb.bb.bb.bb:6650] Disconnected
2021-12-14 15:54:45,507 [ INFO ] ConnectionHandler - [persistent://public/default/level2-pressure-1-partition-48] [level2-54-188] Closed connection [id: 0x92ee3683, L:/aa.aa.aa.aa:34398 !
 R:server.docker.ys/bb.bb.bb.bb:6650] -- Will try again in 0.1 s
2021-12-14 15:54:45,608 [ INFO ] ConnectionHandler - [persistent://public/default/level2-pressure-1-partition-48] [level2-54-188] Reconnecting after timeout
2021-12-14 15:54:45,611 [ INFO ] ProducerImpl - [persistent://public/default/level2-pressure-1-partition-48] [level2-54-188] Creating producer on cnx [id: 0x3050a9d3, L:/aa.aa.aa.aa:34172 - R:server.docker.ys/bb.bb.bb.bb:6650]
2021-12-14 15:54:45,630 [ WARN ] ClientCnx - [id: 0x3050a9d3, L:/aa.aa.aa.aa:34172 - R:server.docker.ys/bb.bb.bb.bb:6650] Received error from server: org.apache.pulsar.broker.service.BrokerServiceException$NamingException: Producer with name 'level2-54-188' is already connected to topic
2021-12-14 15:54:45,630 [ ERROR ] ProducerImpl - [persistent://public/default/level2-pressure-1-partition-48] [level2-54-188] Failed to create producer: org.apache.pulsar.broker.service.BrokerServiceException$NamingException: Producer with name 'level2-54-188' is already connected to topic

In client, it seems there is some problem about the network. Close the connection and retry. But the server send the error “Producer is already connected to topic” for earch retry.

In the server log, do not find the log about close event. 2021-12-14 15:11:07 The server receive the registe info from the port 34398 for the first time. 2021-12-14 15:54:45 get the new connection from the same client with the port of 34172, and registe the same producer.

2021-12-14 15:11:07,283 [ INFO ] ServerCnx - [/aa.aa.aa.aa:34398][persistent://public/default/level2-pressure-1-partition-48] Creating producer. producerId=48
2021-12-14 15:11:07,283 [ INFO ] ServerCnx - [/aa.aa.aa.aa:34398] persistent://public/default/level2-pressure-1-partition-48 configured with schema false
2021-12-14 15:11:07,283 [ INFO ] ServerCnx - [/aa.aa.aa.aa:34398] Created new producer: Producer{topic=PersistentTopic{topic=persistent://public/default/level2-pressure-1-partition-48}, client=/aa.aa.aa.aa:34398, producerName=level2-54-188, producerId=48}
2021-12-14 15:54:45,610 [ INFO ] ServerCnx - New connection from /aa.aa.aa.aa:34172
2021-12-14 15:54:45,629 [ INFO ] ServerCnx - [/aa.aa.aa.aa:34172][persistent://public/default/level2-pressure-1-partition-48] Creating producer. producerId=48
2021-12-14 15:54:45,629 [ INFO ] ServerCnx - [/aa.aa.aa.aa:34172] persistent://public/default/level2-pressure-1-partition-48 configured with schema false
2021-12-14 15:54:45,629 [ ERROR ] ServerCnx - [/aa.aa.aa.aa:34172] Failed to add producer to topic persistent://public/default/level2-pressure-1-partition-48: producerId=48, org.apache.pulsar.broker.service.BrokerServiceException$NamingException: Producer with name 'level2-54-188' is already connected to topic

The server did not get the channel Inactive event about the port 34398 as there is no log about “Closed connection from”.

@Override
    public void channelInactive(ChannelHandlerContext ctx) throws Exception {
        super.channelInactive(ctx);
        connectionController.decreaseConnection(ctx.channel().remoteAddress());
        isActive = false;
        log.info("Closed connection from {}", remoteAddress); // there is no log about this.
        BrokerInterceptor brokerInterceptor = getBrokerService().getInterceptor();
        if (brokerInterceptor != null) {
            brokerInterceptor.onConnectionClosed(this);
        }

And there is still a connection with the port 34398 in the server, but do not find the connection with the port in the client. It seem the FIN package of tcp do not reach the server.

Another problem is that the keep-alive is not work in the server. If it work, the connection of the port of 34398 will close, so the producer can regist again. In org.apache.pulsar.common.protocol.PulsarHandler#handleKeepAliveTimeout, there is no log about the connection with the port 34398 , even open debug level log.

There is a possibility that a exception is threw in org.apache.pulsar.common.protocol.PulsarHandler#handleKeepAliveTimeout. Then the schedule task will not run any more as there is no exeception catch. “ctx.writeAndFlush” may throw some RuntimeException as there is some problem about the network.

The problem appear 2 times. And it does not appear any more after adding some logs.

Additional context Version:2.8.2

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

3reactions
michaeljmarshallcommented, Dec 17, 2021

@suiyuzeng - thanks for opening this issue. I have been working in this part of the code base recently, so I am happy to take a look. I should be able to sometime in the next few days. I am going to assign this to myself for now.

0reactions
suiyuzengcommented, Mar 30, 2022

@michaeljmarshall We have changed to the release version. It is not present any more. Thanks very much.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Solved: Ambari Registration with the server failed. - 269526
Registering with the server... Registration with the server failed. I try to change. First. Hostname = master. Second. Hostnme = master.hadoop.
Read more >
How to Fix a 'Not Registered on Network' Error on Samsung ...
Learn what the "not registered on network" error means on a Samsung Galaxy and how to fix it when your SIM card says...
Read more >
Kafka consumer not automatically reconnecting after outage
In kafka config you can use reconnect.backoff.max.ms config parameter to set a maximum number of milliseconds to retry connecting.
Read more >
How Do I Handle Device Registration Failures with Different ...
Checking the Device Time. Connecting to the device using SSH. The procedure is as follows: Use a network cable to connect the device...
Read more >
Mitel Connect Edge Gateway "No Service/SIP registration failed"
Hello everyone!! We just moved our Edge gateway over to our Sonicwall Firewall from being directly connected to our ATT fiber box.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found