The data is not synchronized when a new node without data is added in dledger mode
See original GitHub issue在一个已经搭建好且正常运行的3节点Deldger集群中,停止掉一台slave的进程,并且清空它的store/dledger_store目录后,再次重启进程,希望模拟为Dledger集群新增一个空节点的场景。
而这个节点却始终没有数据同步过来。
在master的broker_default.log日志中有相关异常,由io.openmessaging.storage.dledger.DLedgerEntryPusher.EntryDispatcher#doWork的try-catch块抛出
2020-02-16 18:22:24 ERROR EntryDispatcher-n0-n2 - [Push-n2]Error in EntryDispatcher-n0-n2 writeIndex=1435933765 compareIndex=-1 io.openmessaging.storage.dledger.exception.DLedgerException: [code=410,name=INDEX_OUT_OF_RANGE,desc=] 1435933765 should between 1551149011-1815212110 at io.openmessaging.storage.dledger.utils.PreConditions.check(PreConditions.java:41) ~[dledger-0.1.jar:na] at io.openmessaging.storage.dledger.store.file.DLedgerMmapFileStore.get(DLedgerMmapFileStore.java:479) ~[dledger-0.1.jar:na] at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doAppendInner(DLedgerEntryPusher.java:389) ~[dledger-0.1.jar:na] at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doAppend(DLedgerEntryPusher.java:464) ~[dledger-0.1.jar:na] at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doWork(DLedgerEntryPusher.java:602) ~[dledger-0.1.jar:na]
在手动操作重启一次master后,数据同步又会正常进行。
请问这个问题的原因是什么?以及为集群新增节点的正确操作是什么,有文档吗?
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (7 by maintainers)
@RongtongJin That would be nice! I will try to work on this.
@RongtongJin I just reproduce it. I think the key point is space clean. The exception always happend after a space clean due to the physicRatio or something else. When data synchronization running between master and a clean slave, is there any chance that the request index no longer exist after cleaning the oldest commitLog/index to make this exception happen?
2020-02-18 19:57:58 INFO DLedgerCleanSpaceService - unmap file[REF:0] /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002244120412160 OK 2020-02-18 19:57:58 INFO DLedgerCleanSpaceService - close file channel /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002244120412160 OK 2020-02-18 19:57:58 INFO DLedgerCleanSpaceService - delete file[REF:0] /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002244120412160 OK, W:1073741824 M:1073741824, 96 2020-02-18 19:57:59 INFO DLedgerCleanSpaceService - unmap file[REF:0] /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002245194153984 OK 2020-02-18 19:57:59 INFO DLedgerCleanSpaceService - close file channel /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002245194153984 OK 2020-02-18 19:57:59 INFO DLedgerCleanSpaceService - delete file[REF:0] /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002245194153984 OK, W:1073741824 M:1073741824, 98 2020-02-18 19:58:00 WARN AdminBrokerThread_6 - matched, but hold failed, request pos=0 fileFromOffset=2244120412160 2020-02-18 19:58:00 INFO DLedgerCleanSpaceService - unmap file[REF:0] /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002246267895808 OK 2020-02-18 19:58:00 INFO DLedgerCleanSpaceService - close file channel /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002246267895808 OK 2020-02-18 19:58:00 INFO DLedgerCleanSpaceService - delete file[REF:0] /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002246267895808 OK, W:1073741824 M:1073741824, 97 2020-02-18 19:58:00 INFO DLedgerFlushDataService - Flush data cost=696 ms 2020-02-18 19:58:00 INFO DLedgerFlushDataService - Flush data cost=507 ms 2020-02-18 19:58:00 INFO DLedgerCleanSpaceService - unmap file[REF:0] /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002247341637632 OK 2020-02-18 19:58:00 INFO DLedgerCleanSpaceService - close file channel /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002247341637632 OK 2020-02-18 19:58:00 INFO DLedgerCleanSpaceService - delete file[REF:0] /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002247341637632 OK, W:1073741824 M:1073741824, 97 2020-02-18 19:58:01 INFO QuorumAckChecker - [n0][LEADER] term=6 ledgerBegin=892303262 ledgerEnd=1696913917 committed=1696913917 watermarks={6:{"n0":1696913917,"n1":899552530,"n2":1696913917}} 2020-02-18 19:58:01 INFO DLedgerCleanSpaceService - unmap file[REF:0] /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002248415379456 OK 2020-02-18 19:58:01 INFO DLedgerCleanSpaceService - close file channel /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002248415379456 OK 2020-02-18 19:58:01 INFO DLedgerFlushDataService - Flush data cost=524 ms 2020-02-18 19:58:01 INFO DLedgerCleanSpaceService - delete file[REF:0] /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002248415379456 OK, W:1073741824 M:1073741824, 95 2020-02-18 19:58:01 INFO DLedgerCleanSpaceService - unmap file[REF:0] /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002249489121280 OK 2020-02-18 19:58:01 INFO DLedgerCleanSpaceService - close file channel /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002249489121280 OK 2020-02-18 19:58:02 INFO DLedgerCleanSpaceService - delete file[REF:0] /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002249489121280 OK, W:1073741824 M:1073741824, 129 2020-02-18 19:58:02 INFO DLedgerCleanSpaceService - unmap file[REF:0] /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002250562863104 OK 2020-02-18 19:58:02 INFO DLedgerCleanSpaceService - close file channel /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002250562863104 OK 2020-02-18 19:58:02 INFO DLedgerCleanSpaceService - delete file[REF:0] /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002250562863104 OK, W:1073741824 M:1073741824, 114 2020-02-18 19:58:02 INFO DLedgerCleanSpaceService - unmap file[REF:0] /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002251636604928 OK 2020-02-18 19:58:02 INFO DLedgerCleanSpaceService - close file channel /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002251636604928 OK 2020-02-18 19:58:02 INFO DLedgerCleanSpaceService - delete file[REF:0] /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002251636604928 OK, W:1073741824 M:1073741824, 112 2020-02-18 19:58:03 INFO DLedgerCleanSpaceService - unmap file[REF:0] /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002252710346752 OK 2020-02-18 19:58:03 INFO DLedgerCleanSpaceService - close file channel /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002252710346752 OK 2020-02-18 19:58:03 WARN EntryDispatcher-n0-n1 - matched, but hold failed, request pos=933544355 fileFromOffset=2252710346752 2020-02-18 19:58:03 ERROR EntryDispatcher-n0-n1 - [Push-n1]Error in EntryDispatcher-n0-n1 writeIndex=899589775 compareIndex=-1 io.openmessaging.storage.dledger.exception.DLedgerException: [code=414,name=DISK_ERROR,desc=] Get null data for 899589775 at io.openmessaging.storage.dledger.utils.PreConditions.check(PreConditions.java:41) ~[dledger-0.1.jar:na] at io.openmessaging.storage.dledger.store.file.DLedgerMmapFileStore.get(DLedgerMmapFileStore.java:489) ~[dledger-0.1.jar:na] at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doAppendInner(DLedgerEntryPusher.java:389) ~[dledger-0.1.jar:na] at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doAppend(DLedgerEntryPusher.java:464) ~[dledger-0.1.jar:na] at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doWork(DLedgerEntryPusher.java:602) ~[dledger-0.1.jar:na] at io.openmessaging.storage.dledger.ShutdownAbleThread.run(ShutdownAbleThread.java:87) [dledger-0.1.jar:na] 2020-02-18 19:58:03 INFO DLedgerCleanSpaceService - delete file[REF:0] /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002252710346752 OK, W:1073741824 M:1073741824, 100 2020-02-18 19:58:03 INFO DLedgerCleanSpaceService - unmap file[REF:0] /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002253784088576 OK 2020-02-18 19:58:03 INFO DLedgerCleanSpaceService - close file channel /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002253784088576 OK 2020-02-18 19:58:03 INFO DLedgerCleanSpaceService - delete file[REF:0] /data/rocketmq/broker/store/dledger_store/dledger-n0/data/00000002253784088576 OK, W:1073741824 M:1073741824, 0 2020-02-18 19:58:03 INFO DLedgerCleanSpaceService - Clean space count=10 timeUp=false checkExpired=true forceClean=true enableForceClean=true diskFull=false storeBaseRatio=0.8500003765681342 dataRatio=0.8500003765681342 2020-02-18 19:58:03 INFO DLedgerCleanSpaceService - unmap file[REF:0] /data/rocketmq/broker/store/dledger_store/dledger-n0/index/00000000028521267200 OK 2020-02-18 19:58:03 INFO DLedgerCleanSpaceService - close file channel /data/rocketmq/broker/store/dledger_store/dledger-n0/index/00000000028521267200 OK 2020-02-18 19:58:03 INFO DLedgerCleanSpaceService - delete file[REF:0] /data/rocketmq/broker/store/dledger_store/dledger-n0/index/00000000028521267200 OK, W:167772160 M:167772160, 13 2020-02-18 19:58:03 ERROR EntryDispatcher-n0-n1 - [Push-n1]Error in EntryDispatcher-n0-n1 writeIndex=899589775 compareIndex=-1 io.openmessaging.storage.dledger.exception.DLedgerException: [code=410,name=INDEX_OUT_OF_RANGE,desc=] 899589775 should between 900518572-1696925904 at io.openmessaging.storage.dledger.utils.PreConditions.check(PreConditions.java:41) ~[dledger-0.1.jar:na] at io.openmessaging.storage.dledger.store.file.DLedgerMmapFileStore.get(DLedgerMmapFileStore.java:479) ~[dledger-0.1.jar:na] at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doAppendInner(DLedgerEntryPusher.java:389) ~[dledger-0.1.jar:na] at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doAppend(DLedgerEntryPusher.java:464) ~[dledger-0.1.jar:na] at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doWork(DLedgerEntryPusher.java:602) ~[dledger-0.1.jar:na] at io.openmessaging.storage.dledger.ShutdownAbleThread.run(ShutdownAbleThread.java:87) [dledger-0.1.jar:na]