question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Failing to cumulative acknowledge messages as old as 15 seconds

See original GitHub issue

Observed behavior: Topic backlog is filling up and never cleared. Expected behavior: Topic backlog is cleared. In further tests observed to exhibit over as low as 5 seconds / 5 consecutive messages.

  • Pulsar 2.4.2, cluster of 4 * brokers/proxies/bookies/zk
  • Single persistent non-partitioned topic
  • Failover subscription
  • Single Producer
  • Single Consumer

Namespace creation:

bin/pulsar-admin tenants create cb
bin/pulsar-admin namespaces create cb/test
bin/pulsar-admin namespaces set-backlog-quota cb/test --limit 300M --policy consumer_backlog_eviction
bin/pulsar-admin namespaces set-deduplication cb/test --enable

Client (shared code):

public class PulsarTestBase {

  protected SystemLogger logger;
  protected String clientId;
  protected PulsarClient pulsar;

  protected String pulsarUrl = "pulsar://3.234.220.124:30004/";

  protected String topicName = "persistent://cb/test/topic-1";

  PulsarTestBase(String clientId) {
    this.clientId = clientId;
    logger = new SystemLogger.Builder(clientId).withCloudwatch(false).withPrometheus(false).build();
  }

  protected void createClient() throws PulsarClientException {
    pulsar =
        PulsarClient.builder()
            .serviceUrl(pulsarUrl)
            .connectionsPerBroker(1) // default: 1
            .connectionTimeout(10, TimeUnit.SECONDS)
            .enableTcpNoDelay(true)
            .keepAliveInterval(999, TimeUnit.DAYS)
            .maxBackoffInterval(5, TimeUnit.SECONDS)
            .startingBackoffInterval(1, TimeUnit.SECONDS)
            .statsInterval(60, TimeUnit.SECONDS)
            // number of threads to be used for handling connections to brokers
            .ioThreads(1) // default: 1
            //  number of threads to be used for message listeners
            .listenerThreads(1) // default: 1
            // number of concurrent lookup-requests allowed to send on each broker-connection
            .maxConcurrentLookupRequests(5000) // default: 5000
            .maxLookupRequests(50000) // default: 50000
            // how many broker-rejects within 30 secs before connection to broker is recycled
            .maxNumberOfRejectedRequestPerConnection(50) // default: 50
            // how long to retry a broker op before the op is marked as failed
            .operationTimeout(30, TimeUnit.SECONDS) // default: 30 sec
            .build();
  }
}

Producer code:

public class TestProducer extends PulsarTestBase {

  public static void main(String[] args) throws Exception {
    new TestProducer(args[0]);
  }

  TestProducer(String clientId) throws Exception {
    super(clientId);
    createClient();

    Producer<Order> producer =
        pulsar
            .newProducer(Schema.PROTOBUF(Order.class))
            .producerName(clientId)
            .topic(topicName)
            .batchingMaxPublishDelay(1, TimeUnit.MILLISECONDS) // default: 1 ms
            .batchingMaxMessages(1000) // default: 1000
            // batching disabled, as we produce far more rarely than 1ms range
            .enableBatching(false) // default: true
            // block vs throw exception if pending-ack queue is full
            .blockIfQueueFull(false) // default: false
            // how many non-acked msgs to buffer before block or throw-exception
            .maxPendingMessages(1000) // default: 1000
            .sendTimeout(30, TimeUnit.SECONDS) // default: 30 sec
            // .initialSequenceId() // TODO what if non-existent ID passed?
            .hashingScheme(HashingScheme.JavaStringHash) // default: java string
            .create();

    V3.log("msg", "producing messages");

    long counter = 0L;

    while (true) {

      Thread.sleep(1000L);

      V3.log("msg", "producing message", "counter", counter);

      producer
          .newMessage()
          .value(Order.newBuilder().setPrice(1d).setSize(counter).build())
          .send();

      counter++;
    }
  }

  private final BiConsumer<? super MessageId, ? super Throwable> producerResultHandler =
      new BiConsumer<MessageId, Throwable>() {
        @Override
        public void accept(MessageId messageId, Throwable ex) {
          if (ex != null) {
            V3.log(ex, "msg", "pulsar producer submit error", "*metric", "DroppedData", "value", 1);
          }
        }
      };
}

Consumer code:

public class TestConsumer extends PulsarTestBase {

  public static void main(String[] args) throws Exception {
    new TestConsumer(args[0]);
  }

  TestConsumer(String clientId) throws Exception {
    super(clientId);
    createClient();

    Consumer<Order> consumer =
        pulsar
            .newConsumer(Schema.PROTOBUF(Order.class))
            .consumerName(clientId)
            .subscriptionInitialPosition(SubscriptionInitialPosition.Latest)
            .subscriptionType(SubscriptionType.Failover)
            .subscriptionName("failover-subscription")
            .topic(topicName)
            .subscribe();

    MessageId firstMsgIdToAck = null;
    int msgAckDelayCounter = 0;

    V3.log("msg", "waiting for messages");

    while (true) {
      Message<Order> message = consumer.receive();
      Order order = message.getValue();

      V3.log(
          "msg", "received message",
          "sequenceId", message.getSequenceId(),
          "price", order.getPrice(),
          "size", order.getSize(),
          "producer", message.getProducerName(),
          "messageId", Pulsar.msgIdToHex(message.getMessageId()));

      if (firstMsgIdToAck == null) {
        firstMsgIdToAck = message.getMessageId();
        V3.log("msg", "storing ack message id", "messageId", Pulsar.msgIdToHex(firstMsgIdToAck));
      }

      if (++msgAckDelayCounter == 15) {
        V3.log("msg", "acknowledging message id", "messageId", Pulsar.msgIdToHex(firstMsgIdToAck));
        consumer.acknowledgeCumulativeAsync(firstMsgIdToAck);
        firstMsgIdToAck = null;
        msgAckDelayCounter = 0;
      }
    }
  }
}

Producer log screenshot: image

Consumer log screenshot: image

Admin console screenshot: image

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
codelipenghuicommented, Feb 6, 2020

@youurayy You are welcome. Feel free to start the discussion again if you find any behaviors in pulsar are not match your requirement.

0reactions
youurayycommented, Feb 6, 2020

@codelipenghui - thanks for looking into this - you are right, and it seems like our app indeed has an issue with a data leak. I’ll reopen this later if there’s a really problem on the pulsar side. Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

TCP Series #3: Network Packet Loss, Retransmissions, and ...
Typically, duplicate acknowledgements mean that one or more packets have been lost in the stream and the connection is attempting to recover.
Read more >
Windows Autopilot known issues | Microsoft Learn
Some devices may intermittently fail TPM attestation during Windows Autopilot pre-provisioning technician flow or self-deployment mode with ...
Read more >
Final Rule: Investment Adviser Marketing - SEC.gov
ACTION: Final rule. SUMMARY: The Securities and Exchange Commission (the “Commission” or the “SEC”) is adopting amendments under the Investment ...
Read more >
Known issues | XenApp and XenDesktop 7.15 LTSR
Known issues in Cumulative Update 8. When using this VDA version, Citrix policies applied to a machine by OU can sometimes fail to...
Read more >
RFC 4960: Stream Control Transmission Protocol
It offers the following services to its users: -- acknowledged error-free ... to all previous user messages sent within the stream on which...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found