ZNP Adapter Commissioning/Restore Decision Process
See original GitHub issueCurrent State
Currently the following startup process is implemented (as per startZnp.ts):
- Check if the adapter has been configured using
hasConfigured
NV item:- Yes: Check if configuration matches data in NV RAM:
- Yes: Regular
startupFromApp
- No: Start BDB commissioning
- Yes: Regular
- No: Check if coordinator backup exists:
- Yes: Restore from backup and
startupFromApp
, - No: Check if configuration matches data in NV RAM:
- Yes: Regular
startupFromApp
- No: Start BDB commissioning
- Yes: Regular
- Yes: Restore from backup and
- Yes: Check if configuration matches data in NV RAM:
BDB Commissioning
Part of BDB commissioning which forms a new ZigBee network is to make sure, no other network with the same PAN ID lives on the same channel in the vicinity of the adapter. This is done by Z-Stack adapter firmware by sending a Beacon Request
when forming a new network. Adapter then waits for Beacon
frames and determines if the configured PAN ID is in use by other ZigBee network nearby.
If a network with the same PAN ID is present, the commissioning process will keep incrementing the PAN ID by one until suitable (free) PAN ID is found. If the configured PAN ID is NOT found it is used. This information is then stored within NIB in adapters NV memory.
This means that contents of ZCD_NV_PANID and ZCD_NV_NIB may differ after network commissioning. This information is discovered by Z2M by ZDO:extNwkInfo
SREQ/SRSP but not taken into account.
Problem
During an adapter upgrade or replacement, where the new adapter has been previously used by Z2M without having NVRAM cleared (eg. used in other network or used before with other parameters), Z2M would start BDB commissioning instead of restoring NV backup to recover the previous network. This is due to the decision process implementation - hasConfigured
parameter is set from previous Z2M instance but parameters mismatch -> BDB Commissioning. The expected action, however, would be coordinator restore from backup.
Proposed Solutions
- NIB Error - terminate application if adapters NV memory value of ZCD_NV_PANID (therefore configured PAN ID) mismatches PAN ID within ZCD_NV_NIB ,
- NIB Management - if
ZDO:extNwkInfo
SRSP mismatches ZCD_NV_PANID (therefore configured PAN ID) update the PAN ID in NIB and write it back to device - then SREQSYS:resetReq
, - Restore If Config Matches (suggested by @Koenkk) - restore coordinator backup if NV memory and configuration mismatches, but configuration and coordinator backup matches (I would still keep in mind the NIB mismatch thingy),
- “Sign” The NV Memory - instead of (or in addition to) simple
hasConfigured
NV item in adapter introduce a value, which uniquely represents every Z2M instance (eg. UUIDv4) - if the instance ID mismatches (use in different Z2M instance) - coordinator NV is rather restored than network re-commissioned.
This might be 2 issues as well (the decision process and commissioned PAN ID validation). I am posting this issue to open a discussion in this matter since it has caused me several problems and by searching on forums and other issues I see other people struggle with this as well (sometimes without a gist where the problem lies).
If we can find a suitable solution, I am willing to do the implementation and submit a PR.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:6
- Comments:39 (33 by maintainers)
I’ve been messing around with the low-level NIB stuff for a while in zigpy-znp so if you want to compare results, here’s what I am using right now: https://github.com/zigpy/zigpy-znp/blob/dev/zigpy_znp/znp/nib.py#L50-L123
A few other data type sizes change between the different architectures so it’s not enough to tweak padding. I’ve documented all of the differences between the two structs in the comment of the
CC2531NIB
class right below the linked code snippet.I also have complete NVRAM dumps for a CC2652R, though the format is a little different than what Z2M uses: https://github.com/zigpy/zigpy-znp/tree/dev/tests/nvram. I would be very interested in some complete CC2538 NVRAM dumps if you have any 😃
Regarding your main issue with PAN ID conflicts in newer Z-Stacks (especially with existing routers on the network): I have been trying to form a network with a PAN ID of
0xFFFF
(to let Z-Stack pick) and then overriding it after a reset by modifying the NIB: https://github.com/zigpy/zigpy-znp/blob/dev/zigpy_znp/zigbee/application.py#L385-L396. I believe the reset is necessary to make sure the frame counter in the NIB is rounded up by Z-Stack, since it’s only written to NVRAM periodically (and I think in multiple places in different versions of Z-Stack)In my tests it has worked and causes my CC2652R to happily form a conflicting network, even with other routers sharing that same PAN ID.
Not entirely related to this discussion, but if you want to collaborate on a unified network backup format between Z2M and zigpy/ZHA and possibly get deCONZ on board, it would be awesome: https://github.com/zigpy/zigpy/issues/557#issuecomment-745072411
I have updated the original comment with data from CC2652’s provided by you all: https://github.com/Koenkk/zigbee-herdsman/issues/286#issuecomment-758830202
@puddly: First of all, thanks for sharing your knowledge 😉
Here are CC2538 NVRAM dumps for you (created with
zigpy_znp/tools/nvram_read.py
): https://gist.github.com/castorw/d697a6b0952c9577038296574b872703 https://gist.github.com/castorw/df2cf1c75c4436897925628f067e5063Regarding the
nwkState
(the only notable difference betwen 8-bit/16-bit addressing) - I don’t think we need to care about this parameter. We can just restore it shifted as is. And even if you want to read it, you still read the same position since the MCUs are all little endian and the maximum value does not reach nowhere near 0xff or above.The idea of unified backup format seems nice. And the finding that NIB can be safely altered is great. I will dedicate some time for testing this with CC2531/CC2538 and conversions between NIBs as well.
@Koenkk: To get this issue back on track (since it seems to have spiralled a bit) I would like to split this in separate issues and close this one:
Agreed?