Implement __getattr__ as a prefect.Task magic method
See original GitHub issueCurrent behavior
Currently, prefect.Task
implements a number of operators that implicitly add extra tasks to the current flow context. These are known internally as “Magic Methods”.
__getitem__
is a good example of this:
import prefect
@prefect.task
def return_dict():
return { "a" : "b" }
with prefect.Flow("getitem_test") as f:
dict_task = return_dict()
value_task = dict_task["a"]
value_task = dict_task["a"]
will implicitly create a new task (value_task
) that runs __getitem__
on the return value of dict_task
, thus returning “b”.
Proposed behavior
prefect.Task
should also implement __getattr__
, which calls a custom Task whose __call__
method is overridden. Here is a simple implementation:
class MyTask(prefect.Task):
def __getattr__(self, attr):
getattr_task = MyGetattr().bind(obj = self, attr = attr)
return getattr_task
class MyGetattr(MyTask):
def run(self, obj, attr):
return obj.__getattr__(attr)
def __call__(self, *args, **kwargs):
return prefect.task(lambda x : x(*args, **kwargs)).bind(self)
This would be extremely useful for being able to implicitly access properties or call methods of the return value of MyTask.run()
. Since obj.__getattr__(attr)
only runs if attr
is not present in obj.__dict__
, it is guaranteed not to interfere with any properties/methods already implemented by prefect.Task
.
Example
import pandas as pd
# monkey patch a couple preexisting "magic methods"
# (obviously, if prefect.Task natively implemented __getattr__, this would not be necessary)
prefect.tasks.core.operators.GetItem.__getattr__ = MyTask.__getattr__
prefect.tasks.core.function.FunctionTask.__getattr__ = MyTask.__getattr__
class PandasTask(MyTask):
def run(self):
return pd.DataFrame({ "x" : range(0, 10), "y" : range(20, 30) })
with prefect.Flow(name = "dataframe_example") as f:
# return a basic Pandas dataframe
df = PandasTask()
# implicity call the drop method of the dataframe; drop_task will contain
# rows 5 through the end
df_dropped = df.drop(range(0, 5))
# implicity access the column "x" from the dataframe
col_x = df.x
# implicity access column "y" via the loc object of the dataframe. note that loc's custom
# __getitem__ implementation automatically works
col_y = df.loc[:, "y"]
# of course, these tasks can be infinitely chained, implicitly adding new tasks to the DAG
y_dropped = col_y.drop(range(0, 5)).transpose()
x_dropped = col_x.drop(range(10, -1))
I am happy to submit a PR to add this functionality — it should be fairly trivial to implement.
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (6 by maintainers)
I agree that we should first focus on your suggested
.data
attribute (since we’d need it as a fallback for shadowed attributes anyway). I suspect there are subtleties I’m not yet seeing with__getattr__
, but thus far in my own implementation it seems to be working without any issues. Perhaps rigorously verifying that it has no ill effects could be a “moonshot” goal, and something for a separate issue thread.I’m not sure I agree with this. I really like the fact that all magic methods behave as if the task object and return value are equivalent, e.g.
If as you suggest the return value would now be accessed in
task.data
(sotask.data["xyz"]
would replacetask["xyz"]
), shouldn’t the same logic apply to all the other magics? I think it would be confusing from a UX perspective for__getitem__
to behave differently from everything else. The above code would then becomewhich is a step backwards in simplicity, IMO. The only need for introducing
.data
to access attributes of a returned object is because the.
operator is overloaded to access native Task attributes; AFAIK[]
(and all the other operators) do not have similar overloading issues.I only listed dataframes as an example of where this would be useful (and indeed, for that one specific use case, Dask is great!) However, I think this would be handy for any Prefect task returning an object whose attributes are then passed to downstream tasks. For example,
Having first-class support for accessing attributes of returned objects is much cleaner than the current alternative, which would require having to create a special handler class that unwraps objects, e.g.
Having to create such a wrapper class that explicitly unpacks the desired attributes of an object into something accessible by
prefect.Task.__getitem__
is cumbersome, and makes the workflow definition un-Pythonic. I’m a huge fan of Prefect because unlike any other workflow definition API I’ve come across, Prefect flow definitions (mostly) read like standard imperative Python, due to the many magic methods already implemented.Fair enough. In my application (which wraps Prefect), I’ll continue to add in this functionality via monkey patching. Thanks for the discussion!