Deduplication causes a lot of Full GCs
See original GitHub issueDescribe the bug
2019/02/14 Added
In our experiments, we found that enabling deduplication causes a lot of Full GCs in Brokers, which seems to cause session expiration from ZooKeepers and finally shutdown.
2019/02/07 Original report of unexpected Broker shutdown
We have seen unexpected Broker shutdown.
- There were LedgerFencedExceptions for a lot of ledgers:
01:47:09.430 [BookKeeperClientWorker-OrderedExecutor-43-0] WARN o.a.bookkeeper.client.PendingAddOp - Fencing exception on write: L9104171 E28233 on xxx.xx.xx.xx:3181
01:47:09.430 [BookKeeperClientWorker-OrderedExecutor-43-0] ERROR o.a.bookkeeper.client.LedgerHandle - Closing ledger 9104171 due to LedgerFencedException: Ledger has been fenced off. Some other client must have opened it to read
01:47:09.430 [BookKeeperClientWorker-OrderedExecutor-43-0] WARN o.a.bookkeeper.client.PendingAddOp - Fencing exception on write: L9104171 E28234 on xxx.xx.xx.xx:3181
01:47:09.430 [BookKeeperClientWorker-OrderedExecutor-43-0] ERROR o.a.bookkeeper.client.LedgerHandle - Closing ledger 9104171 due to LedgerFencedException: Ledger has been fenced off. Some other client must have opened it to read
01:47:09.430 [BookKeeperClientWorker-OrderedExecutor-43-0] WARN o.a.bookkeeper.client.PendingAddOp - Fencing exception on write: L9104171 E28235 on xxx.xx.xx.xx:3181
01:47:09.430 [BookKeeperClientWorker-OrderedExecutor-43-0] ERROR o.a.bookkeeper.client.LedgerHandle - Closing ledger 9104171 due to LedgerFencedException: Ledger has been fenced off. Some other client must have opened it to read
...
- There were a lot of “Failed to create producer: Producer with name geo-replicator”.
01:47:09.907 [pulsar-io-21-31] ERROR o.a.pulsar.client.impl.ProducerImpl - [persistent://<topicname>] [pulsar.repl.<localcluster>] Failed to create producer: Producer with name 'pulsar.repl.<localcluster>' is already connected to topic
- Finally, Broker suddenly stopped with
01:47:09.963 [pulsar-ordered-OrderedExecutor-4-0-EventThread] ERROR o.a.p.z.ZooKeeperSessionWatcher - ZooKeeper session already expired, invoking shutdown
Additional context Broker OS: CentOS Linux release 7.6.1810 Broker version: 2.1.1
Issue Analytics
- State:
- Created 5 years ago
- Comments:19 (19 by maintainers)
Top Results From Across the Web
Full garbage collection causes performance issues - Windows ...
This article provides workarounds for performance problems that are caused by the churn from full garbage collection during deduplication.
Read more >The Logic of Physical Garbage Collection in Deduplicating ...
We describe two variants of garbage collection in a commercial deduplicating storage system, a logical GC that operates on the files containing.
Read more >Handling duplicate data in streaming pipeline using Pub/Sub ...
There could be several reasons like network failure, system errors etc that can produce duplicate data. Such duplicates are referred to as ...
Read more >UseStringDeduplication: Pros and Cons - DZone Java
Thus, this feature saves less memory if there are a lot of short duplicate strings. (6). Java 8 Update 20. The -XX:+UseStringDeduplication ...
Read more >Windows Server Data Deduplication GC automation
Mostly marketing guys moving tens of GB of data from folder to folder. This moves are reason why the deduplication rate is dropping...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
As @hrsakai pointed out, the fix was ineffective because applied on a code path that’s not being used.
The problem is that that while the cursor is set as “inactive” in the beginning, a periodic check is flipping back the state to “active”:
https://github.com/apache/pulsar/blob/43380523c5269c152f61b2aa8f7b70281c770d1d/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L878-L885
Working on a fix.
Updated #3612 with correct fix