Flakiness issues with subscriptionKeySharedUseConsistentHashing=true / PIP-119 in CPP tests
See original GitHub issueDescribe the bug
Quoting @BewareMyPower from #13963
I tried to use three Java consumers with
Key_Shared
subscription to consume the topic produced by C++ testKeySharedConsumerTest.testMultiTopics
. Sometimes not all messages can be received as well. It looks like there is something wrong with the consistent hashing implementation ofKey_Shared
dispatcher.
I also made similar observations based on C++ test logs:
FAILED TESTS (3/279):
9941 ms: ./main KeySharedConsumerTest.testMultiTopics (try #1)
6242 ms: ./main KeySharedConsumerTest.testKeyBasedBatching (try #1)
9740 ms: ./main KeySharedConsumerTest.testMultiTopics (try #2)
full logs in https://github.com/apache/pulsar/suites/5064608592/artifacts/150614790
2022-01-26 11:26:24.372 INFO [140238723073792] MultiTopicsConsumerImpl:95 | Successfully Subscribed to Topics
2022-01-26 11:26:33.950 INFO [140238845213440] KeySharedConsumerTest:124 | messagesPerConsumer: {0 => 1098, 1 => 811, 2 => 1027}
/pulsar/pulsar-client-cpp/tests/KeySharedConsumerTest.cc:129: Failure
Value of: expectedNumTotalMessages
Actual: 3000
Expected: numTotalMessages
Which is: 2936
2022-01-26 11:26:33.951 INFO [140238845213440] ClientImpl:496 | Closing Pulsar client with 3 producers and 3 consumers
To Reproduce Steps to reproduce the behavior:
- Set
subscriptionKeySharedUseConsistentHashing=true
- Produce messages to multiple topics using key shared
- Consume messages
Expected behavior There shouldn’t be any message loss
Issue Analytics
- State:
- Created 2 years ago
- Comments:16 (16 by maintainers)
Top Results From Across the Web
How to Fix Flaky Tests - Semaphore CI
Randomly failing tests are the hardest to debug. Here's a framework you can use to fix them and keep your test suite healthy....
Read more >Probabilistic flakiness: How do you test your tests?
The probabilistic flakiness score helps us measure and monitor test reliability and quickly adapt to any changes over time.
Read more >Fix your flaky tests problem - Undo.io
Eliminate flaky test failures with Software Failure Replay. Spend time eliminating flaky tests not investigating them. Fix intermittent failures fast.
Read more >What are Flaky Tests? | TeamCity CI/CD Guide - JetBrains
Flaky tests are tests that return new results, despite there being no changes to code. Find out why flaky tests matter and how...
Read more >Flaky tests - GitLab Docs
Usually, running the test locally several times would reproduce the problem. Resolution: Depending on the problem, you might want to: loosen the assertion...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yeah, here is a screenshot of my Java consumer application. The topic was created by C++ UT and received 3000 messages from C++ producer. Java consumers should have received 3000 messages in total, and sometimes it works well.
The code is
But I cannot reproduce it with Java UT easily at the moment.
I agree. I renamed the issue. I’ll close this issue since it seems to be addressed. @BewareMyPower Please reopen if there’s more to do.