Log compaction fails due to timeout
See original GitHub issueDescribe the bug Log compaction fails due to a timeout.
To Reproduce
- Start pulsar standalone
- Create producer
- Produce messages at high throughput
- Trigger topic compaction
- Read compaction status
Detailed steps to reproduce:
- Clear all state
# rm -r data logs # !!! careful !!! this wipes all your topics etc.
- Start pulsar standalone
bin/pulsar standalone
- Create producer
PulsarClient pulsarClient = PulsarClient.builder()
.serviceUrl("pulsar://192.168.11.22:6650")
.build();
Producer<byte[]> producer = pulsarClient.newProducer()
.topic("my-topic")
.create();
- Produce messages at high throughput
int msgCount = 500000;
long before = System.currentTimeMillis();
for (int i = 0; i < msgCount; i++) {
producer.newMessage()
.key("asdf")
.value(new byte[] {1, 2, 3, 4, 5})
.sendAsync();
}
System.out.println("Sent " + msgCount + " messages in " + (System.currentTimeMillis() - before) + " msec");
- Trigger topic compaction
$ bin/pulsar-admin topics compact persistent://public/default/my-topic
Topic compaction requested for persistent://public/default/my-topic
- Read compaction status
$ bin/pulsar-admin topics compaction-status persistent://public/default/my-topic
Compaction is currently running
$ # WAIT FOR 10 SECONDS
$ bin/pulsar-admin topics compaction-status persistent://public/default/my-topic
Error in compaction
null
Reason: Error compacting: java.util.concurrent.TimeoutException: Timeout
Expected behavior Compaction and no error
Desktop (please complete the following information):
- Pulsar version: 2.5.0 (tar.gz from website)
- OS:
Linux 4.19.0-4-amd64 #1 SMP Debian 4.19.28-2 (2019-03-15) x86_64 unknown unknown GNU/Linux
Logs The complete logs from the start of compaction until the error was read via compaction status
09:47:54.409 [pulsar-web-57-8] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - Starting Pulsar consumer status recorder with config: {
"topicNames" : [ "persistent://public/default/my-topic" ],
"topicsPattern" : null,
"subscriptionName" : "__compaction",
"subscriptionType" : "Exclusive",
"receiverQueueSize" : 1000,
"acknowledgementsGroupTimeMicros" : 100000,
"negativeAckRedeliveryDelayMicros" : 60000000,
"maxTotalReceiverQueueSizeAcrossPartitions" : 50000,
"consumerName" : null,
"ackTimeoutMillis" : 0,
"tickDurationMillis" : 1000,
"priorityLevel" : 0,
"cryptoFailureAction" : "FAIL",
"properties" : { },
"readCompacted" : true,
"subscriptionInitialPosition" : "Latest",
"patternAutoDiscoveryPeriod" : 1,
"regexSubscriptionMode" : "PersistentOnly",
"deadLetterPolicy" : null,
"autoUpdatePartitions" : true,
"replicateSubscriptionState" : false,
"resetIncludeHead" : false,
"keySharedPolicy" : null
}
09:47:54.410 [pulsar-web-57-8] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - Pulsar client config: {
"serviceUrl" : "pulsar://localhost:6650",
"authPluginClassName" : null,
"authParams" : null,
"operationTimeoutMs" : 30000,
"statsIntervalSeconds" : 60,
"numIoThreads" : 1,
"numListenerThreads" : 1,
"connectionsPerBroker" : 1,
"useTcpNoDelay" : true,
"useTls" : false,
"tlsTrustCertsFilePath" : null,
"tlsAllowInsecureConnection" : false,
"tlsHostnameVerificationEnable" : false,
"concurrentLookupRequest" : 5000,
"maxLookupRequest" : 50000,
"maxNumberOfRejectedRequestPerConnection" : 50,
"keepAliveIntervalSeconds" : 30,
"connectionTimeoutMs" : 10000,
"requestTimeoutMs" : 60000,
"initialBackoffIntervalNanos" : 100000000,
"maxBackoffIntervalNanos" : 60000000000
}
09:47:54.411 [pulsar-web-57-8] INFO org.eclipse.jetty.server.RequestLog - 127.0.0.1 - - [31/1/2020:09:47:54 +0900] "PUT /admin/v2/persistent/public/default/my-topic/compaction HTTP/1.1" 204 0 "-" "Pulsar-Java-v2.5.0" 6
09:47:54.411 [pulsar-client-io-85-1] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://public/default/my-topic][__compaction] Subscribing to topic on cnx [id: 0xeb973738, L:/127.0.0.1:60180 - R:localhost/127.0.0.1:6650]
09:47:54.412 [pulsar-io-50-7] INFO org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.1:60180] Subscribing on topic persistent://public/default/my-topic / __compaction
09:47:54.412 [pulsar-io-50-7] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/my-topic-__compaction] Rewind from 1544:100 to 1544:0
09:47:54.412 [pulsar-io-50-7] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/my-topic] There are no replicated subscriptions on the topic
09:47:54.412 [pulsar-io-50-7] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/my-topic][__compaction] Created new subscription for 1
09:47:54.412 [pulsar-io-50-7] INFO org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.1:60180] Created subscription on topic persistent://public/default/my-topic / __compaction
09:47:54.413 [pulsar-client-io-85-1] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://public/default/my-topic][__compaction] Subscribed to topic on localhost/127.0.0.1:6650 -- consumer: 1
09:47:54.413 [pulsar-client-io-85-1] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://public/default/my-topic][__compaction] Seek subscription to message id -1:-1:-1
09:47:54.415 [pulsar-io-50-7] INFO org.apache.pulsar.broker.service.Consumer - Disconnecting consumer: Consumer{subscription=CompactorSubscription{topic=persistent://public/default/my-topic, name=__compaction}, consumerId=1, consumerName=b411f, address=/127.0.0.1:60180}
09:47:54.416 [pulsar-io-50-7] INFO org.apache.pulsar.broker.service.AbstractDispatcherSingleActiveConsumer - Removing consumer Consumer{subscription=CompactorSubscription{topic=persistent://public/default/my-topic, name=__compaction}, consumerId=1, consumerName=b411f, address=/127.0.0.1:60180}
09:47:54.416 [pulsar-client-io-85-1] INFO org.apache.pulsar.client.impl.ClientCnx - [localhost/127.0.0.1:6650] Broker notification of Closed consumer: 1
09:47:54.416 [pulsar-client-io-85-1] INFO org.apache.pulsar.client.impl.ConnectionHandler - [persistent://public/default/my-topic] [__compaction] Closed connection [id: 0xeb973738, L:/127.0.0.1:60180 - R:localhost/127.0.0.1:6650] -- Will try again in 0.1 s
09:47:54.416 [pulsar-io-50-7] INFO org.apache.pulsar.broker.service.persistent.PersistentSubscription - [persistent://public/default/my-topic][__compaction] Successfully disconnected consumers from subscription, proceeding with cursor reset
09:47:54.416 [bookkeeper-ml-workers-OrderedExecutor-2-0] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/my-topic] Initiate reset position to 1544:-1 on cursor __compaction
09:47:54.424 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/my-topic] reset position to 1544:-1 before current read position 1544:0 on cursor __compaction
09:47:54.425 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.1:60180] [persistent://public/default/my-topic][__compaction] Reset subscription to message id -1:-1
09:47:54.426 [pulsar-client-io-85-1] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://public/default/my-topic][__compaction] Successfully reset subscription to message id -1:-1:-1
09:47:54.516 [pulsar-timer-87-1] INFO org.apache.pulsar.client.impl.ConnectionHandler - [persistent://public/default/my-topic] [__compaction] Reconnecting after timeout
09:47:54.517 [pulsar-client-io-85-1] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://public/default/my-topic][__compaction] Subscribing to topic on cnx [id: 0xeb973738, L:/127.0.0.1:60180 - R:localhost/127.0.0.1:6650]
09:47:54.517 [pulsar-io-50-7] INFO org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.1:60180] Subscribing on topic persistent://public/default/my-topic / __compaction
09:47:54.518 [pulsar-io-50-7] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/my-topic-__compaction] Rewind from 1544:-1 to 1544:0
09:47:54.518 [pulsar-io-50-7] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/my-topic] There are no replicated subscriptions on the topic
09:47:54.518 [pulsar-io-50-7] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/my-topic][__compaction] Created new subscription for 1
09:47:54.518 [pulsar-io-50-7] INFO org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.1:60180] Created subscription on topic persistent://public/default/my-topic / __compaction
09:47:54.518 [pulsar-client-io-85-1] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://public/default/my-topic][__compaction] Subscribed to topic on localhost/127.0.0.1:6650 -- consumer: 1
09:47:54.526 [pulsar-external-listener-86-1] WARN org.apache.pulsar.client.impl.ConsumerImpl - [persistent://public/default/my-topic] [__compaction] Could not get connection while getLastMessageId -- Will try again in 100 ms
09:47:54.527 [pulsar-external-listener-86-1] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://public/default/my-topic][__compaction] Get topic last message Id
09:47:54.527 [pulsar-client-io-85-1] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://public/default/my-topic][__compaction] Successfully getLastMessageId 1544:37344
09:47:54.527 [pulsar-client-io-85-1] INFO org.apache.pulsar.compaction.TwoPhaseCompactor - Commencing phase one of compaction for persistent://public/default/my-topic, reading to 1544:37344:-1
09:47:57.416 [pulsar-web-57-12] INFO org.eclipse.jetty.server.RequestLog - 127.0.0.1 - - [31/1/2020:09:47:57 +0900] "GET /admin/v2/persistent/public/functions/coordinate/stats HTTP/1.1" 200 887 "-" "Pulsar-Java-v2.5.0" 5
09:48:02.838 [pulsar-web-57-1] INFO org.eclipse.jetty.server.RequestLog - 127.0.0.1 - - [31/1/2020:09:48:02 +0900] "GET /admin/v2/persistent/public/default/my-topic/compaction HTTP/1.1" 200 35 "-" "Pulsar-Java-v2.5.0" 2
09:48:04.646 [pulsar-io-50-7] INFO org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.1:60180] Closing consumer: 1
09:48:04.646 [pulsar-io-50-7] INFO org.apache.pulsar.broker.service.AbstractDispatcherSingleActiveConsumer - Removing consumer Consumer{subscription=CompactorSubscription{topic=persistent://public/default/my-topic, name=__compaction}, consumerId=1, consumerName=b411f, address=/127.0.0.1:60180}
09:48:04.646 [pulsar-io-50-7] INFO org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.1:60180] Closed consumer Consumer{subscription=CompactorSubscription{topic=persistent://public/default/my-topic, name=__compaction}, consumerId=1, consumerName=b411f, address=/127.0.0.1:60180}
09:48:04.646 [pulsar-client-io-85-1] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://public/default/my-topic] [__compaction] Closed consumer
09:48:05.941 [pulsar-web-57-3] INFO org.eclipse.jetty.server.RequestLog - 127.0.0.1 - - [31/1/2020:09:48:05 +0900] "GET /admin/v2/persistent/public/default/my-topic/compaction HTTP/1.1" 200 79 "-" "Pulsar-Java-v2.5.0" 2
09:48:27.417 [pulsar-web-57-6] INFO org.eclipse.jetty.server.RequestLog - 127.0.0.1 - - [31/1/2020:09:48:27 +0900] "GET /admin/v2/persistent/public/functions/coordinate/stats HTTP/1.1" 200 887 "-" "Pulsar-Java-v2.5.0" 5
Issue Analytics
- State:
- Created 4 years ago
- Comments:15 (12 by maintainers)
Top Results From Across the Web
Kafka Log Compaction not starting - Stack Overflow
Records won't get compacted until after this period. The setting gives consumers time to get every record.
Read more >test/format compaction doesn't handle timeout error return
mahajan, the problem here is there's a default timeout for compaction of 1200 seconds. The test/format program wasn't configuring a compaction ...
Read more >Back up and Compaction of LQE database fails or take a long ...
Attempts to back up or compact the Lifecycle Query Engine (LQE) in IBM Jazz Reporting Service (JRS) is failing and no error is...
Read more >Kafka 3.3 Documentation
The log helps replicate data between nodes and acts as a re-syncing mechanism for failed nodes to restore their data. The log compaction...
Read more >Slowness in the UI loading or recommendation jobs failing ...
Druid compaction job failures for context tables in Druid log files. ... Recommendation jobs may timeout or fail due to worse query ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@sijie You are right, I will write an issue.
@EugenDueck Yes