question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Make context less nebulous

See original GitHub issue

Description

Currently context is a dictionary. The dictionary is a bit nebulous especially for new users as it isn’t clear what is present inside it and the autocomplete functionality of arbitrary dictionaries leaves a lot to be desired.

Discoverability and type safety could be greatly improved if context was refactored into a dataclass (preferred) or typedDict(less ideal but probably easier).

Use case / motivation

Avoid having to know the keys present in context

def get_current_context() -> Dict:
  pass
  

context = get_current_context()
ti = context["ti"]

would be instead


@dataclass
class Context:
  # would be the place to see what values are in context and optionally include comments
  ti: TaskInstance # this is the task Instance
  ds: DateTime # This is the execution date
  # etc...

def get_current_context() -> Context:
  pass
  
context = get_current_context()
#`context.` would have strong autocomplete here
ti = context.ti
Context Dataclass MVP V2 From later in the thread that prevents a breaking change & warns on unsafe updates to context

@dataclass
class Demo:  # context replacement
    id: str
    value_dc: int
    user_defined: Dict[str, Any] = field(default_factory=dict)

    def __getitem__(self, item):
        if item in self.__dict__.keys():
            logging.warning(msg=f"dictionary interface getitem on context is deprecated; update to use the dataclass interface for standard fields like `{item}`")
            return self.__dict__[item]
        elif item in self.user_defined:
            logging.warning(msg=f"dictionary interface getitem on context is deprecated; update to use context.user_defined for custom fields like `{item}`")
            return self.user_defined[item]
        else:
            raise KeyError

    def __setitem__(self, key: str, value):
        if key in self.__dict__.keys():
            msg = f"""dictionary interface setitem for standard fields is deprecated; update to use the dataclass interface for standard fields like `{key}`
            note: changing standard context fields is not supported and may have undefined behavior. If this is meant to be a custom field use context.user_defined instead"""
            logging.warning(msg=msg)
            self.__dict__[key] = value
        else:
            logging.warning(
                msg=f"dictionary interface setitem on context is deprecated; update to use context.user_defined for custom fields like `{key}`")
            self.user_defined[key] = value

    def keys(self):
        # added as an example to show how far we could go to have a non-breaking change for 2.1
        logging.warning(msg=f"dictionary interface keys is deprecated; update this to use the dataclass interface")
        temp = self.__dict__
        temp.update(self.user_defined)
        return temp


d = Demo(id="long_id", value_dc=1337)
print(d["id"])
d["new"] = 3
print(d["new"])
print(d.keys())
d["id"] = "warn"

returns

WARNING:root:dictionary interface getitem on context is deprecated; update to use the dataclass interface for standard fields like `id`
WARNING:root:dictionary interface setitem on context is deprecated; update to use context.user_defined for custom fields like `new`
WARNING:root:dictionary interface getitem on context is deprecated; update to use context.user_defined for custom fields like `new`
WARNING:root:dictionary interface keys is deprecated; update this to use the dataclass interface
WARNING:root:dictionary interface setitem for standard fields is deprecated; update to use the dataclass interface for standard fields like `id`
            note: changing standard context fields is not supported and may have undefined behavior. If this is meant to be a custom field use context.user_defined instead
long_id
3
{'id': 'long_id', 'value_dc': 1337, 'user_defined': {'new': 3}, 'new': 3}

My end goal is to be able to do the following when writing callables that can have context passed to them

def generate_is_latest_callable(tasks_if_latest: List[str] , tasks_if_not_latest: List[str]) -> Callable:
  def result(context: Context) -> List[str]:
    context. #and be able to get strong autocomplete and typing while in an ide here
    # because currently all I can do is context: Dict[str, Any] which isn't very helpful
    if context.something:
      return tasks_if_latest
    else:
      return tasks_if_not_latest
  return result

t_branch = PythonBranchingOperator(
  task_id="branch",
  python_callable=generate_is_latest_callable(["yes"], ["no"]),
  provide_context=True,
  dag=dag,
)

Are you willing to submit a PR?

This one touches too much of airflow’s internals for me to try and tackle.

Related Issues

Not to my knowledge Slack Conversation: https://apache-airflow.slack.com/archives/C0146STM600/p1614103441043600?thread_ts=1614099549.042800&cid=C0146STM600

@kaxil

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:6
  • Comments:40 (34 by maintainers)

github_iconTop GitHub Comments

2reactions
ferruzzicommented, Nov 3, 2021

Looking at maybe taking this on. I also had a good bit of “what the heck is even IN context and what’s it for?” when I first started looking at the codebase, so I can see the benefit to new contributors and it’ll be a good one to learn, myself.

  1. Would the ideal solution be to make a new module under airflow.models to define the Context TypedDict and import it into taskinstance.py, or just define it at the top of airflow.models.taskinstance?

  2. If/once I do define the TypedDict, are any other changes needed in order to implement it? I have a rough implementation locally already and it seems to pass CI without any further changes, but want to make sure I’m not underestimating the scope of this.

  3. Context appears to contain some other Dicts (conf and var, specifically), should I chase that rabbit and define those as well while I am at it?

  4. It looks like we use both datetime.datetime and pendulum.DateTime in various places, which would we prefer to use here in the Context definition for the various timestamp fields? [[ Discussing here ]]

1reaction
potiukcommented, Mar 14, 2022

I think it’s long addressed.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Word of the Day: Nebulous | Merriam-Webster
Nebulous itself, when it doesn't have interstellar implications, usually means "cloudy" or "foggy" in a figurative sense.
Read more >
definition of nebulous by The Free Dictionary
Liable to more than one interpretation : ambiguous, cloudy, equivocal, inexplicit, obscure, uncertain, unclear, vague.
Read more >
nebulous - ART19
In English, nebula refers to a cloud of gas or dust in deep space, or in less technical contexts, simply to a galaxy....
Read more >
Learn English Words - NEBULOUS - Meaning, Vocabulary ...
Understand English vocabulary words in context, study WITHOUT the need of ... Practice speaking fluent English and learn words that make you ...
Read more >
EVERYTHING SO NEBULOUS - Portland State University
Thus they provide answers to the questions of order, meaning, ... In the modern world, social class appears much less rigidly fixed than...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found