question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Find ways to reduce cache operations when command consumers get (temporarily) disconnected

See original GitHub issue

When an AMQP/MQTT device, listening for command messages, gets disconnected from the protocol adapter, the association between device id and protocol adapter instance id gets removed in the device connection command router service (removeCommandHandlingAdapterInstance unregisterCommandConsumer). When using the Infinispan cache based service implementation, this means a cache.remove operation.

When a large number of devices get disconnected at the same time, quite a lot of cache traffic will be generated. This is also due to the fact that removeCommandHandlingAdapterInstance unregisterCommandConsumer will trigger 2 cache operations, getWithMetadata and removeWithVersion (making sure that no newer mapping entry gets removed).

In most cases however, the removal of these kinds of cache entries isn’t even necessary. When the corresponding device reconnects and subscribes for commands again, the cache entry will be overwritten.

There is only one case where a remaining, obsolete cache entry will cause problems: if a gateway subscribes for commands for all its devices, after one of these devices has previously subscribed (and unsubscribed) for commands (see #1858).

The aim here is to find a way to reduce unnecessary cache.remove operations while preventing the above possible issue with stale entries.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:16 (16 by maintainers)

github_iconTop GitHub Comments

1reaction
sophokles73commented, Jul 13, 2022

I would suggest going for the first option here, introducing the unregister-cmd-consumers batch requests sent in a configurable interval, and moving the “connected” and “disconnected” events to the command router.

In order to make it a little more robust, the adapters should send a batch as soon as the batch size has been reached, i.e. they should not wait for the interval to have passed before sending the batch in this case.

When an adapter instance crashes, the worst thing that can happen is that we loose all the piled up device disconnected information and the Command Router will not be able to send corresponding disconnected events downstream. So for the time it takes the devices to connect to another adapter instance, downstream applications will still believe that the devices are connected but commands that they send to these devices will be failed by the command router, right? This time period should be short, however, and the applications need to be prepared to handle failed commands anyway because even if they did not receive a disconnect event for a device yet, the device may already have disconnected from the adapter and we simply did not notice yet (due to heart beat intervals).

0reactions
calohmncommented, Nov 22, 2022

@sophokles73 yes, I’ve removed it from 2.2.0. I have also created #3445 as a follow-up concerning the batch requests. Discussion can be continued there.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Find ways to reduce cache operations when command consumers ...
1 Remove call to unregister-cmd-consumer. Move sending of "connected" and "disconnected" events to command router. Keep track of unsubscribed/closed devices and ...
Read more >
What is Cache (Computing)? - TechTarget
Cache policies​​ Various caching policies determine how the cache operates. Then include the following: Write-around cache writes operations to storage, skipping ...
Read more >
Caching guidance - Azure Architecture Center | Microsoft Learn
Caching is a common technique that aims to improve the performance and scalability of a system. It caches data by temporarily copying frequently...
Read more >
Django's cache framework
A value of 0 for CULL_FREQUENCY means that the entire cache will be dumped when MAX_ENTRIES is reached. On some backends ( database...
Read more >
How to Boost Your Outlook Mail Client's Performance by ...
The concept of a cache is not just for email; it is a feature operating systems use to help streamline users' experiences by...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found