Producer registe failed when reconnecting
See original GitHub issueDescribe the bug Producer registe failed and retry still failed when reconnecting after keep-alive timeout. Logs as follows:
2021-12-14 15:11:07,284 [ INFO ] ProducerImpl - [persistent://public/default/level2-pressure-1-partition-48] [level2-54-188] Created producer on cnx [id: 0x92ee3683, L:/aa.aa.aa.aa:34398
- R:server.docker.ys/bb.bb.bb.bb:6650]
2021-12-14 15:54:45,473 [ WARN ] PulsarHandler - [[id: 0x92ee3683, L:/aa.aa.aa.aa:34398 - R:server.docker.ys/bb.bb.bb.bb:6650]] Forcing connection to close after keep-alive timeout
2021-12-14 15:54:45,507 [ INFO ] ClientCnx - [id: 0x92ee3683, L:/aa.aa.aa.aa:34398 ! R:server.docker.ys/bb.bb.bb.bb:6650] Disconnected
2021-12-14 15:54:45,507 [ INFO ] ConnectionHandler - [persistent://public/default/level2-pressure-1-partition-48] [level2-54-188] Closed connection [id: 0x92ee3683, L:/aa.aa.aa.aa:34398 !
R:server.docker.ys/bb.bb.bb.bb:6650] -- Will try again in 0.1 s
2021-12-14 15:54:45,608 [ INFO ] ConnectionHandler - [persistent://public/default/level2-pressure-1-partition-48] [level2-54-188] Reconnecting after timeout
2021-12-14 15:54:45,611 [ INFO ] ProducerImpl - [persistent://public/default/level2-pressure-1-partition-48] [level2-54-188] Creating producer on cnx [id: 0x3050a9d3, L:/aa.aa.aa.aa:34172 - R:server.docker.ys/bb.bb.bb.bb:6650]
2021-12-14 15:54:45,630 [ WARN ] ClientCnx - [id: 0x3050a9d3, L:/aa.aa.aa.aa:34172 - R:server.docker.ys/bb.bb.bb.bb:6650] Received error from server: org.apache.pulsar.broker.service.BrokerServiceException$NamingException: Producer with name 'level2-54-188' is already connected to topic
2021-12-14 15:54:45,630 [ ERROR ] ProducerImpl - [persistent://public/default/level2-pressure-1-partition-48] [level2-54-188] Failed to create producer: org.apache.pulsar.broker.service.BrokerServiceException$NamingException: Producer with name 'level2-54-188' is already connected to topic
In client, it seems there is some problem about the network. Close the connection and retry. But the server send the error “Producer is already connected to topic” for earch retry.
In the server log, do not find the log about close event. 2021-12-14 15:11:07 The server receive the registe info from the port 34398 for the first time. 2021-12-14 15:54:45 get the new connection from the same client with the port of 34172, and registe the same producer.
2021-12-14 15:11:07,283 [ INFO ] ServerCnx - [/aa.aa.aa.aa:34398][persistent://public/default/level2-pressure-1-partition-48] Creating producer. producerId=48
2021-12-14 15:11:07,283 [ INFO ] ServerCnx - [/aa.aa.aa.aa:34398] persistent://public/default/level2-pressure-1-partition-48 configured with schema false
2021-12-14 15:11:07,283 [ INFO ] ServerCnx - [/aa.aa.aa.aa:34398] Created new producer: Producer{topic=PersistentTopic{topic=persistent://public/default/level2-pressure-1-partition-48}, client=/aa.aa.aa.aa:34398, producerName=level2-54-188, producerId=48}
2021-12-14 15:54:45,610 [ INFO ] ServerCnx - New connection from /aa.aa.aa.aa:34172
2021-12-14 15:54:45,629 [ INFO ] ServerCnx - [/aa.aa.aa.aa:34172][persistent://public/default/level2-pressure-1-partition-48] Creating producer. producerId=48
2021-12-14 15:54:45,629 [ INFO ] ServerCnx - [/aa.aa.aa.aa:34172] persistent://public/default/level2-pressure-1-partition-48 configured with schema false
2021-12-14 15:54:45,629 [ ERROR ] ServerCnx - [/aa.aa.aa.aa:34172] Failed to add producer to topic persistent://public/default/level2-pressure-1-partition-48: producerId=48, org.apache.pulsar.broker.service.BrokerServiceException$NamingException: Producer with name 'level2-54-188' is already connected to topic
The server did not get the channel Inactive event about the port 34398 as there is no log about “Closed connection from”.
@Override
public void channelInactive(ChannelHandlerContext ctx) throws Exception {
super.channelInactive(ctx);
connectionController.decreaseConnection(ctx.channel().remoteAddress());
isActive = false;
log.info("Closed connection from {}", remoteAddress); // there is no log about this.
BrokerInterceptor brokerInterceptor = getBrokerService().getInterceptor();
if (brokerInterceptor != null) {
brokerInterceptor.onConnectionClosed(this);
}
And there is still a connection with the port 34398 in the server, but do not find the connection with the port in the client. It seem the FIN package of tcp do not reach the server.
Another problem is that the keep-alive is not work in the server. If it work, the connection of the port of 34398 will close, so the producer can regist again. In org.apache.pulsar.common.protocol.PulsarHandler#handleKeepAliveTimeout, there is no log about the connection with the port 34398 , even open debug level log.
There is a possibility that a exception is threw in org.apache.pulsar.common.protocol.PulsarHandler#handleKeepAliveTimeout. Then the schedule task will not run any more as there is no exeception catch. “ctx.writeAndFlush” may throw some RuntimeException as there is some problem about the network.
The problem appear 2 times. And it does not appear any more after adding some logs.
Additional context Version:2.8.2
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (4 by maintainers)
@suiyuzeng - thanks for opening this issue. I have been working in this part of the code base recently, so I am happy to take a look. I should be able to sometime in the next few days. I am going to assign this to myself for now.
@michaeljmarshall We have changed to the release version. It is not present any more. Thanks very much.