Find ways to reduce cache operations when command consumers get (temporarily) disconnected
See original GitHub issueWhen an AMQP/MQTT device, listening for command messages, gets disconnected from the protocol adapter, the association between device id and protocol adapter instance id gets removed in the device connection command router service (removeCommandHandlingAdapterInstance
unregisterCommandConsumer
). When using the Infinispan cache based service implementation, this means a cache.remove operation.
When a large number of devices get disconnected at the same time, quite a lot of cache traffic will be generated.
This is also due to the fact that removeCommandHandlingAdapterInstance
unregisterCommandConsumer
will trigger 2 cache operations, getWithMetadata
and removeWithVersion
(making sure that no newer mapping entry gets removed).
In most cases however, the removal of these kinds of cache entries isn’t even necessary. When the corresponding device reconnects and subscribes for commands again, the cache entry will be overwritten.
There is only one case where a remaining, obsolete cache entry will cause problems: if a gateway subscribes for commands for all its devices, after one of these devices has previously subscribed (and unsubscribed) for commands (see #1858).
The aim here is to find a way to reduce unnecessary cache.remove operations while preventing the above possible issue with stale entries.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:16 (16 by maintainers)
Top GitHub Comments
In order to make it a little more robust, the adapters should send a batch as soon as the batch size has been reached, i.e. they should not wait for the interval to have passed before sending the batch in this case.
When an adapter instance crashes, the worst thing that can happen is that we loose all the piled up device disconnected information and the Command Router will not be able to send corresponding disconnected events downstream. So for the time it takes the devices to connect to another adapter instance, downstream applications will still believe that the devices are connected but commands that they send to these devices will be failed by the command router, right? This time period should be short, however, and the applications need to be prepared to handle failed commands anyway because even if they did not receive a disconnect event for a device yet, the device may already have disconnected from the adapter and we simply did not notice yet (due to heart beat intervals).
@sophokles73 yes, I’ve removed it from 2.2.0. I have also created #3445 as a follow-up concerning the batch requests. Discussion can be continued there.