Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AddressController/ResourceChecker improvements

See original GitHub issue

why

Currently AddressController#onUpdate is called periodically with a list of all addresses currently defined in the system. This informs the AddressController about new addresses and allows it to progress the provisioning of existing addresses.

The current design has a problem that the list of addresses may include stale address representations. These arise when a previous invocation of AddressController#onUpdate caused the address to be updated (and its underlying configmap rewritten), but the update to the configmap has not yet been reported to the Watcher by OpenShift. Stale addresses are most likely to be seen when provisioning many addresses in a short space of time.

These stale addresses were the cause of #1781. Here the staleness allowed the address controller to spuriously reassign an address to different broker during the provisioning process. #1988 works around the problem by preventing the address controller from being able to make the write based on stale data, however, internally it is still making spurious determinations.

Proposal

The interface between the Controller and the Watcher should be changed and responsibilities refactored.

The watcher’s responsibility should be merely notify a controller about new/removed/updated objects. To do this the Watcher, internally, will continue to use a Kubernetes watcher coupled with a periodic resync (to avoid the possibility of missed updates). It would notify the controller by invoking a method like this:

#onUpdate(Set<?> added, Set<?> removed, Set<?> updated)

Currently the Controllers rely on the Watcher’s periodic call of onUpdate() to allow it to progress things like address provisioning. This invocation happens even if there have been no changes. In the new scheme, Controllers themselves would become an active objects. They would utilise a thread of thier own, or utilise Vertx callbacks to progress their work.

In the case of AddressController, it would internally keep a representation of all addresses in the system. It would update this state in sympathy with its provisioning decisions thus avoiding the ‘stale’ issue. I think it would probably be best if the onUpdate() method delivered a sets of address names rather than whole address objects.

Testing

The units tests would be rewritten. Existing system tests would not require changes.

Documentation

No user visible changes.

Tasklist

This would be a large refactoring touching many parts of the system. A POC implementation would be made on a branch, which would be iterated on until suitable implementation agreed on. At that point, the other controller implementations would be ported.

Issue Analytics

State:
Created 5 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

k-wallcommented, Nov 15, 2018

One side effect of the current version of onUpdate() is that the controllers will check the status of the addresses on each invocation. In this proposal, I assume the controller would instead spawn a separate thread to perform this periodic sync?

Yes - this is what I meant when I said active object. The controllers would either have a thread of its own, or use the work facilities of Vertx (if appropriate). Either way, it would be arranged to periodically awaken in order to progress any work that needs to be done (address provisioning etc) even if no new addresses were defined.

In that case, there needs to be some synchronization to ensure only 1 thread is changing the address state.

I was imagining the onUpdate method updating a thread safe data structure and then somehow wake-up the controller’s thread/nudging vertx so that it processes the adds/deletes/updates in a timely fashion. The design would be clear on each threads’ responsibility.

Note that by moving the responsibility of keeping track of the current/desired state into the controller, one would also need to update the controllers in address-space-controller and keycloak-controller accordingly.

Understood.

I’m a bit worried that if the controller is relying on state of new/deleted/updated being reported, it is more vulnerable to an event being missed or multiple events masking each other, but perhaps this is anyway the case with the underlying fifoqueue etc.

I don’t think the suggestion brings any new risk.

0reactions

k-wallcommented, Jun 12, 2020

@lulf I think we let it die. closing

Top Results From Across the Web

Readiness rules descriptions in Route 53 ARC

This section lists the readiness rules descriptions for all the types of resources supported by Amazon Route 53 Application Recovery Controller.

[AOS Only] What to do when /home partition or /home/nutanix ...

Solution: · Checking the space usage in /home. To accommodate a potential AOS upgrade, usage should be below 70%. · Cleaning unnecessary files ......

Tools & Resources - Cisco

List of tools including CLI Analyzer, Bug Search, Software Research, TAC support BETA tools, and others.

Check Point R81.10

R81.10 brings a major improvement in operational security efficiency across the management server's reliability, performance, and scale.

TOP 4 CAUSES OF STORAGE I/O BOTTLENECKS & HOW TO ...

When storage performance issues are present, it impacts several resources attached ... application availability improvements since the RAID controller can ...