Prevent stale command subscription entries in Device Connection Service
See original GitHub issueScenario:
A device, that is normally only interacting via a gateway, has done a ttd
HTTP request to the Hono HTTP protocol adapter in order to receive a command message.
Having received a command message, there is some error when invoking the removeCommandHandlingAdapterInstance
method in the Device Connection Service, as part of the protocol adapter removing the command subscription. This leads to a stale commandHandlingAdapterInstance
entry concerning the device in the Device Connection Service.
Now, for all subsequent command subscriptions only done by the gateway of the device, there is the issue that the stale commandHandlingAdapterInstance
entry will effectively prevent commands from getting delivered to the gateway. This is due to the device-specific commandHandlingAdapterInstance
entry getting precedence.
Only a subsequent created and removed command subscription for the specific device (not the gateway), where the removeCommandHandlingAdapterInstance
method succeeds, will mitigate such a situation.
To prevent or automatically resolve such a situation, this looks like a straightforward solution:
- Implement a retry mechanism for the
removeCommandHandlingAdapterInstance
invocation for as long as it fails.
That is easy to implement but doesn’t necessarily prevent the above problem if the adapter is restarted while the commandHandlingAdapterInstance
entry hasn’t been successfully removed yet.
Therefore it looks like we need to implement the following instead (or as well):
- Let
commandHandlingAdapterInstance
entries expire after a certain time and implement periodic refresh requests from the protocol adapter while a subscription is still active.
Issue Analytics
- State:
- Created 3 years ago
- Comments:23 (23 by maintainers)
Top GitHub Comments
Stale records would first be excluded from
getCommandHandlingAdapterInstances
results via the association with an offline adapter instance (the device connection service implementation has to exclude these entries) and eventually get removed via a periodic cleanup task inside the device connection service implementation (remove “offline” markers older than, say, 24hrs and remove all command handling adapter instance mapping entries containing the adapter instance of the “offline” marker). (I’ve also edited the comment above to make this a bit clearer.)That means, that for command subscriptions without a ttd, no lifespan and no periodic updates/refresh operations (triggered by the protocol adapter that initiated the subscription) are needed. For command subscriptions with a ttd, I think having the lifespan attached to the mapping entry can still be useful as an additional means to achieve consistency in valid mapping entries.
Basically, the advantage of the 2nd solution is that with one device connection api request, all mapping entries gone stale because of a killed adapter instance can get deactivated. Each stale mapping entry doesn’t have to get removed individually by a protocol adapter that gets a “no credit” error. And if entries aren’t actually stale because there was just a glitch in the AMQP messaging network, all the relevant mapping entries can get reactivated with just one device connection api request (removing the “offline” marker).
@sophokles73 Yes, I’m currently working on that.