question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Deduplication causes a lot of Full GCs

See original GitHub issue

Describe the bug

2019/02/14 Added

In our experiments, we found that enabling deduplication causes a lot of Full GCs in Brokers, which seems to cause session expiration from ZooKeepers and finally shutdown.

2019/02/07 Original report of unexpected Broker shutdown

We have seen unexpected Broker shutdown.

  1. There were LedgerFencedExceptions for a lot of ledgers:
01:47:09.430 [BookKeeperClientWorker-OrderedExecutor-43-0] WARN  o.a.bookkeeper.client.PendingAddOp   - Fencing exception on write: L9104171 E28233 on xxx.xx.xx.xx:3181
01:47:09.430 [BookKeeperClientWorker-OrderedExecutor-43-0] ERROR o.a.bookkeeper.client.LedgerHandle   - Closing ledger 9104171 due to LedgerFencedException: Ledger has been fenced off. Some other client must have opened it to read
01:47:09.430 [BookKeeperClientWorker-OrderedExecutor-43-0] WARN  o.a.bookkeeper.client.PendingAddOp   - Fencing exception on write: L9104171 E28234 on xxx.xx.xx.xx:3181
01:47:09.430 [BookKeeperClientWorker-OrderedExecutor-43-0] ERROR o.a.bookkeeper.client.LedgerHandle   - Closing ledger 9104171 due to LedgerFencedException: Ledger has been fenced off. Some other client must have opened it to read
01:47:09.430 [BookKeeperClientWorker-OrderedExecutor-43-0] WARN  o.a.bookkeeper.client.PendingAddOp   - Fencing exception on write: L9104171 E28235 on xxx.xx.xx.xx:3181
01:47:09.430 [BookKeeperClientWorker-OrderedExecutor-43-0] ERROR o.a.bookkeeper.client.LedgerHandle   - Closing ledger 9104171 due to LedgerFencedException: Ledger has been fenced off. Some other client must have opened it to read
...
  1. There were a lot of “Failed to create producer: Producer with name geo-replicator”.
01:47:09.907 [pulsar-io-21-31] ERROR o.a.pulsar.client.impl.ProducerImpl  - [persistent://<topicname>] [pulsar.repl.<localcluster>] Failed to create producer: Producer with name 'pulsar.repl.<localcluster>' is already connected to topic
  1. Finally, Broker suddenly stopped with
01:47:09.963 [pulsar-ordered-OrderedExecutor-4-0-EventThread] ERROR o.a.p.z.ZooKeeperSessionWatcher      - ZooKeeper session already expired, invoking shutdown

Additional context Broker OS: CentOS Linux release 7.6.1810 Broker version: 2.1.1

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:19 (19 by maintainers)

github_iconTop GitHub Comments

1reaction
merlimatcommented, Mar 6, 2019

As @hrsakai pointed out, the fix was ineffective because applied on a code path that’s not being used.

The problem is that that while the cursor is set as “inactive” in the beginning, a periodic check is flipping back the state to “active”:

https://github.com/apache/pulsar/blob/43380523c5269c152f61b2aa8f7b70281c770d1d/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L878-L885

Working on a fix.

0reactions
merlimatcommented, Mar 7, 2019

Updated #3612 with correct fix

Read more comments on GitHub >

github_iconTop Results From Across the Web

Full garbage collection causes performance issues - Windows ...
This article provides workarounds for performance problems that are caused by the churn from full garbage collection during deduplication.
Read more >
The Logic of Physical Garbage Collection in Deduplicating ...
We describe two variants of garbage collection in a commercial deduplicating storage system, a logical GC that operates on the files containing.
Read more >
Handling duplicate data in streaming pipeline using Pub/Sub ...
There could be several reasons like network failure, system errors etc that can produce duplicate data. Such duplicates are referred to as ...
Read more >
UseStringDeduplication: Pros and Cons - DZone Java
Thus, this feature saves less memory if there are a lot of short duplicate strings. (6). Java 8 Update 20. The -XX:+UseStringDeduplication ...
Read more >
Windows Server Data Deduplication GC automation
Mostly marketing guys moving tens of GB of data from folder to folder. This moves are reason why the deduplication rate is dropping...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found