Make context less nebulous
See original GitHub issueDescription
Currently context is a dictionary. The dictionary is a bit nebulous especially for new users as it isn’t clear what is present inside it and the autocomplete functionality of arbitrary dictionaries leaves a lot to be desired.
Discoverability and type safety could be greatly improved if context was refactored into a dataclass (preferred) or typedDict(less ideal but probably easier).
Use case / motivation
Avoid having to know the keys present in context
def get_current_context() -> Dict:
pass
context = get_current_context()
ti = context["ti"]
would be instead
@dataclass
class Context:
# would be the place to see what values are in context and optionally include comments
ti: TaskInstance # this is the task Instance
ds: DateTime # This is the execution date
# etc...
def get_current_context() -> Context:
pass
context = get_current_context()
#`context.` would have strong autocomplete here
ti = context.ti
Context Dataclass MVP V2 From later in the thread that prevents a breaking change & warns on unsafe updates to context
@dataclass
class Demo: # context replacement
id: str
value_dc: int
user_defined: Dict[str, Any] = field(default_factory=dict)
def __getitem__(self, item):
if item in self.__dict__.keys():
logging.warning(msg=f"dictionary interface getitem on context is deprecated; update to use the dataclass interface for standard fields like `{item}`")
return self.__dict__[item]
elif item in self.user_defined:
logging.warning(msg=f"dictionary interface getitem on context is deprecated; update to use context.user_defined for custom fields like `{item}`")
return self.user_defined[item]
else:
raise KeyError
def __setitem__(self, key: str, value):
if key in self.__dict__.keys():
msg = f"""dictionary interface setitem for standard fields is deprecated; update to use the dataclass interface for standard fields like `{key}`
note: changing standard context fields is not supported and may have undefined behavior. If this is meant to be a custom field use context.user_defined instead"""
logging.warning(msg=msg)
self.__dict__[key] = value
else:
logging.warning(
msg=f"dictionary interface setitem on context is deprecated; update to use context.user_defined for custom fields like `{key}`")
self.user_defined[key] = value
def keys(self):
# added as an example to show how far we could go to have a non-breaking change for 2.1
logging.warning(msg=f"dictionary interface keys is deprecated; update this to use the dataclass interface")
temp = self.__dict__
temp.update(self.user_defined)
return temp
d = Demo(id="long_id", value_dc=1337)
print(d["id"])
d["new"] = 3
print(d["new"])
print(d.keys())
d["id"] = "warn"
returns
WARNING:root:dictionary interface getitem on context is deprecated; update to use the dataclass interface for standard fields like `id`
WARNING:root:dictionary interface setitem on context is deprecated; update to use context.user_defined for custom fields like `new`
WARNING:root:dictionary interface getitem on context is deprecated; update to use context.user_defined for custom fields like `new`
WARNING:root:dictionary interface keys is deprecated; update this to use the dataclass interface
WARNING:root:dictionary interface setitem for standard fields is deprecated; update to use the dataclass interface for standard fields like `id`
note: changing standard context fields is not supported and may have undefined behavior. If this is meant to be a custom field use context.user_defined instead
long_id
3
{'id': 'long_id', 'value_dc': 1337, 'user_defined': {'new': 3}, 'new': 3}
My end goal is to be able to do the following when writing callables that can have context passed to them
def generate_is_latest_callable(tasks_if_latest: List[str] , tasks_if_not_latest: List[str]) -> Callable:
def result(context: Context) -> List[str]:
context. #and be able to get strong autocomplete and typing while in an ide here
# because currently all I can do is context: Dict[str, Any] which isn't very helpful
if context.something:
return tasks_if_latest
else:
return tasks_if_not_latest
return result
t_branch = PythonBranchingOperator(
task_id="branch",
python_callable=generate_is_latest_callable(["yes"], ["no"]),
provide_context=True,
dag=dag,
)
Are you willing to submit a PR?
This one touches too much of airflow’s internals for me to try and tackle.
Related Issues
Not to my knowledge Slack Conversation: https://apache-airflow.slack.com/archives/C0146STM600/p1614103441043600?thread_ts=1614099549.042800&cid=C0146STM600
Issue Analytics
- State:
- Created 3 years ago
- Reactions:6
- Comments:40 (34 by maintainers)
Top GitHub Comments
Looking at maybe taking this on. I also had a good bit of “what the heck is even IN
context
and what’s it for?” when I first started looking at the codebase, so I can see the benefit to new contributors and it’ll be a good one to learn, myself.Would the ideal solution be to make a new module under
airflow.models
to define the Context TypedDict and import it intotaskinstance.py
, or just define it at the top ofairflow.models.taskinstance
?If/once I do define the TypedDict, are any other changes needed in order to implement it? I have a rough implementation locally already and it seems to pass CI without any further changes, but want to make sure I’m not underestimating the scope of this.
Context appears to contain some other Dicts (
conf
andvar
, specifically), should I chase that rabbit and define those as well while I am at it?It looks like we use both
datetime.datetime
andpendulum.DateTime
in various places, which would we prefer to use here in the Context definition for the various timestamp fields? [[ Discussing here ]]I think it’s long addressed.