Support scoped event handler statuses
See original GitHub issueSetting self.model.unit.status
across multiple event handlers doesn’t work in charms. The Operator Framework needs to present some better way of handling this, or else charm authors are forced to write one event handler that responds to every event.
A simple example of the issue:
- Event handler A sets
BlockedStatus
- Event handler B sets
BlockedStatus
- Event handler A runs successfully, sets
ActiveStatus
- Whoops, now event handler B’s
BlockedStatus
got lost
See discussion here for more mitigation attempts, but anything yet proposed ultimately just ends up shoving the one-event-handler-to-rule-them-all into something else, such as _is_ready
or update_status
helper functions.
See also this discussion, which is about relation-level statuses. Allowing relation-level statuses is a pragmatic solution, because it doesn’t solve the general case, but does solve the case of one application install event handler, plus N relation event handlers. This pattern applies to most or all of the K8s charms written today.
As a concrete example of what the title means, this decorator allows for an event handler to only care about its own status, without having to worry about the state of the rest of the application:
def handle_status_properly(f):
"""Updates status after decorated event handler is run.
WARNING: For demonstration purposes only. Does not persist statuses.
"""
STATUSES = {}
@wraps(f)
def wrapper(self, event):
try:
status = f(self, event)
except Exception as err:
status = BlockedStatus(str(err))
if status is None:
if f.__name__ in STATUSES:
del STATUSES[f.__name__]
else:
STATUSES[f.__name__] = status
if STATUSES:
status_type = type(list(STATUSES.values())[0])
self.model.unit.status = status_type('; '.join([st.message for st in STATUSES.values()]))
else:
self.model.unit.status = ActiveStatus()
return wrapper
class Operator(CharmBase):
def __init__(self, *args):
super().__init__(*args)
self.framework.observe(self.on.install, self.install)
self.framework.observe(self.on["istio-pilot"].relation_changed, self.install)
self.framework.observe(self.on.config_changed, self.install)
@handle_status_properly
def install(self, event):
if not self.unit.is_leader():
return WaitingStatus("Waiting for leadership")
if self.model.config['kind'] not in ('ingress', 'egress'):
return BlockedStatus('Config item `kind` must be set')
if not self.model.relations['istio-pilot']:
return BlockedStatus("Waiting for istio-pilot relation")
try:
pilot = get_interface(self, "istio-pilot")
except NoVersionsListed as err:
return WaitingStatus(str(err))
if not pilot.get_data():
return BlockedStatus("Waiting for istio-pilot relation data")
pilot = list(pilot.get_data().values())[0]
env = Environment(loader=FileSystemLoader('src'))
template = env.get_template('manifest.yaml')
rendered = template.render(
kind=self.model.config['kind'],
namespace=self.model.name,
pilot_host=pilot['service-name'],
pilot_port=pilot['service-port'],
)
subprocess.run(["./kubectl", "apply", "-f-"], input=rendered.encode('utf-8'), check=True)
Note that any other event handler that runs and wants to set ActiveStatus
has to ensure that all of that code ran successfully, which is equivalent to just running that code, and puts us back into one event handler to rule them all.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:12 (12 by maintainers)
Top GitHub Comments
Due to the complexity involved in charming stuff up, and in an attempt to keep things simple (dependency pattern is not simple), afaik, “one event to rule them all” is a very good tradeoff and just works, every time. Again, I will be very happy to be proven wrong on this one.
fwiw, alertmanager has a
_common_exit_hook
because all alertmanager units need to know the address of at least one other unit, and prometheus needs to know them all (over relation data):RuntimeError: two objects claiming to be AlertmanagerCharm/on/start[16] have been created
.update_status
anyway to complete the startup sequence (on a low resource host, OF still doesn’t see the unit IP address after all startup events fired - you can still see occasionally in alertmanager or prometheus integration tests)I would love to see a good example of a complex charm that does not use a “main event gate”.