question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Producer with name xxx is already connect to topic

See original GitHub issue

Describe the bug Master issue: #13061 https://github.com/apache/pulsar-client-go/issues/676 Pulsar version:2.8.1

After our investigation, this problem occurs when the ping/pong between the client and the server gradually deviates, until the client senses that the connection is closed, and the connection close operation fails due to network reasons, and the underlying network is not disconnected, resulting in pulsar The broker is still waiting for the ping/pong to time out, but the client has already used the same PartitionProducer, reconnected via the network (changed the port), and started AddProducer to the pulsar broker.

https://github.com/apache/pulsar/pull/11804, this PR rewrites the equals method of the Producer, resulting in that when different pulsar-client-go uses different port to reconnect, the old producer cannot be removed because the remoteAddress will be verified by equals:

if (producers.remove(producer.getProducerName(), producer)) {

https://github.com/apache/pulsar/pull/12846, this pr removes equals and will use hashcode for judgment. At this time, the old producer cannot be removed.

This problem can be closed when the pulsar broker perceives ping/pong timeout, or the channel is abnormal, and the connection can be closed, and the producer state can be cleaned up. When the client AddProducer again, it can be restored; but during this period, the client reconnects and starts the add producer. The broker will always report an error: Producer with name is already connect to topic.

Therefore, I feel that the current protocol cannot fully prove whether the producer client can overwrite itself. It may be necessary to add some fields to prove: I am me

To Reproduce Steps to reproduce the behavior:

  1. Change broker keepAliveIntervalSeconds=100
  2. You can choose a pulsar client in any language, such as pulsar-client-go or java and other clients
  3. Use the client to send data to the pulsar server
  4. Use a firewall to disconnect the network between the client and the broker. The time is maintained in 60s. After waiting for 60 seconds, close the firewall
  5. Now, you can check the broker log, at this time you can see the error: Producer with name is already connect to topic

Expected behavior A clear and concise description of what you expected to happen.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:16 (15 by maintainers)

github_iconTop GitHub Comments

1reaction
lhotaricommented, Jun 2, 2022

To Reproduce Steps to reproduce the behavior:

  1. Change broker keepAliveIntervalSeconds=100
  2. You can choose a pulsar client in any language, such as pulsar-client-go or java and other clients
  3. Use the client to send data to the pulsar server
  4. Use a firewall to disconnect the network between the client and the broker. The time is maintained in 60s. After waiting for 60 seconds, close the firewall
  5. Now, you can check the broker log, at this time you can see the error: Producer with name is already connect to topic

@wenbingshen I think that this is not a bug that you are describing. It might be a way to reproduce “Producer with name xxx is already connect to topic”, but if the steps are followed, I think that it’s completely expected behavior.

1reaction
Technoboy-commented, May 31, 2022

Ah, we find that keepAliveIntervalSeconds exists on both client and server-side. The default value is 30s. In your reproduce step, the server-side is 100s. The broker will clean up the producer info when the channel is inactive. So it’s about 100s to do this. But client-side will close the channel in the 30s, and the firewall may cause the closure of the channel not to be sent, then the client will reconnect. In this case, the server-side has the producer info and will throw the above exception.

Read more comments on GitHub >

github_iconTop Results From Across the Web

ERROR Error when sending message to topic - Stack Overflow
So the root cause is that Kafka broker internally using listeners=PLAINTEXT://E.F.G.H:9092 property while staring a producer.This property must match to start a ...
Read more >
Kafka 2.7 Documentation
The Producer API to publish (write) a stream of events to one or more Kafka topics. The Consumer API to subscribe to (read)...
Read more >
Troubleshoot Amazon Redshift connection errors
I'm unable to connect to my Amazon Redshift cluster. ... This error can indicate a permissions issue with accessing your Amazon Redshift cluster....
Read more >
Replicator Overview | Confluent Documentation
Although Replicator can enable applications in different datacenters to access topics with the same names, you should design client applications with a topic...
Read more >
WebLogic Server Known and Resolved ... - Oracle Help Center
You can now use the Administration Console to view details for each Jolt Connection Pool connection. Click on a connection pool entry in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found