Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support scoped event handler statuses

See original GitHub issue

Setting self.model.unit.status across multiple event handlers doesn’t work in charms. The Operator Framework needs to present some better way of handling this, or else charm authors are forced to write one event handler that responds to every event.

A simple example of the issue:

Event handler A sets BlockedStatus
Event handler B sets BlockedStatus
Event handler A runs successfully, sets ActiveStatus
Whoops, now event handler B’s BlockedStatus got lost

See discussion here for more mitigation attempts, but anything yet proposed ultimately just ends up shoving the one-event-handler-to-rule-them-all into something else, such as _is_ready or update_status helper functions.

See also this discussion, which is about relation-level statuses. Allowing relation-level statuses is a pragmatic solution, because it doesn’t solve the general case, but does solve the case of one application install event handler, plus N relation event handlers. This pattern applies to most or all of the K8s charms written today.

As a concrete example of what the title means, this decorator allows for an event handler to only care about its own status, without having to worry about the state of the rest of the application:

def handle_status_properly(f):
    """Updates status after decorated event handler is run.

    WARNING: For demonstration purposes only. Does not persist statuses.
    """
    STATUSES = {}

    @wraps(f)
    def wrapper(self, event):
        try:
            status = f(self, event)
        except Exception as err:
            status = BlockedStatus(str(err))

        if status is None:
            if f.__name__ in STATUSES:
                del STATUSES[f.__name__]
        else:
            STATUSES[f.__name__] = status

        if STATUSES:
            status_type = type(list(STATUSES.values())[0])
            self.model.unit.status = status_type('; '.join([st.message for st in STATUSES.values()]))
        else:
            self.model.unit.status = ActiveStatus()
    return wrapper


class Operator(CharmBase):
    def __init__(self, *args):
        super().__init__(*args)

        self.framework.observe(self.on.install, self.install)
        self.framework.observe(self.on["istio-pilot"].relation_changed, self.install)
        self.framework.observe(self.on.config_changed, self.install)

    @handle_status_properly
    def install(self, event):
        if not self.unit.is_leader():
            return WaitingStatus("Waiting for leadership")

        if self.model.config['kind'] not in ('ingress', 'egress'):
            return BlockedStatus('Config item `kind` must be set')

        if not self.model.relations['istio-pilot']:
            return BlockedStatus("Waiting for istio-pilot relation")

        try:
            pilot = get_interface(self, "istio-pilot")
        except NoVersionsListed as err:
            return WaitingStatus(str(err))

        if not pilot.get_data():
            return BlockedStatus("Waiting for istio-pilot relation data")

        pilot = list(pilot.get_data().values())[0]

        env = Environment(loader=FileSystemLoader('src'))
        template = env.get_template('manifest.yaml')
        rendered = template.render(
            kind=self.model.config['kind'],
            namespace=self.model.name,
            pilot_host=pilot['service-name'],
            pilot_port=pilot['service-port'],
        )

        subprocess.run(["./kubectl", "apply", "-f-"], input=rendered.encode('utf-8'), check=True)

Note that any other event handler that runs and wants to set ActiveStatus has to ensure that all of that code ran successfully, which is equivalent to just running that code, and puts us back into one event handler to rule them all.

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:12 (12 by maintainers)

Top GitHub Comments

1reaction

sed-icommented, Nov 5, 2021

Due to the complexity involved in charming stuff up, and in an attempt to keep things simple (dependency pattern is not simple), afaik, “one event to rule them all” is a very good tradeoff and just works, every time. Again, I will be very happy to be proven wrong on this one.

1reaction

sed-icommented, Nov 5, 2021

fwiw, alertmanager has a _common_exit_hook because all alertmanager units need to know the address of at least one other unit, and prometheus needs to know them all (over relation data):

the combination of bug/1929364 and bug/1933303 requires charm code to hold off of some actions (upload layer and start service) before something else happens (OF sees the ip address is assigned to the unit).
using defer in multiple places creates a race condition
a complete use-case pattern, using defer + reemit, is not possible: reemit() is blocking + events stack, which causes: RuntimeError: two objects claiming to be AlertmanagerCharm/on/start[16] have been created.
one “mega-event” helps a lot with idempotency
BONUS: either way, I still find I have to rely on update_status anyway to complete the startup sequence (on a low resource host, OF still doesn’t see the unit IP address after all startup events fired - you can still see occasionally in alertmanager or prometheus integration tests)

I would love to see a good example of a complex charm that does not use a “main event gate”.