Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Implement getattr as a prefect.Task magic method

See original GitHub issue

Current behavior

Currently, prefect.Task implements a number of operators that implicitly add extra tasks to the current flow context. These are known internally as “Magic Methods”.

__getitem__ is a good example of this:

import prefect

@prefect.task
def return_dict():
  return { "a" : "b" }

with prefect.Flow("getitem_test") as f:
  dict_task = return_dict()
  value_task = dict_task["a"]

value_task = dict_task["a"] will implicitly create a new task (value_task) that runs __getitem__ on the return value of dict_task, thus returning “b”.

Proposed behavior

prefect.Task should also implement __getattr__, which calls a custom Task whose __call__ method is overridden. Here is a simple implementation:

class MyTask(prefect.Task):
  def __getattr__(self, attr):
    getattr_task = MyGetattr().bind(obj = self, attr = attr)
    return getattr_task

class MyGetattr(MyTask):
  def run(self, obj, attr):
    return obj.__getattr__(attr)

  def __call__(self, *args, **kwargs):
    return prefect.task(lambda x : x(*args, **kwargs)).bind(self)

This would be extremely useful for being able to implicitly access properties or call methods of the return value of MyTask.run(). Since obj.__getattr__(attr) only runs if attr is not present in obj.__dict__, it is guaranteed not to interfere with any properties/methods already implemented by prefect.Task.

Example

import pandas as pd

# monkey patch a couple preexisting "magic methods" 
# (obviously, if prefect.Task natively implemented __getattr__, this would not be necessary)
prefect.tasks.core.operators.GetItem.__getattr__ = MyTask.__getattr__
prefect.tasks.core.function.FunctionTask.__getattr__ = MyTask.__getattr__

class PandasTask(MyTask):
    def run(self):
        return pd.DataFrame({ "x" : range(0, 10), "y" : range(20, 30) })

with prefect.Flow(name = "dataframe_example") as f:
    # return a basic Pandas dataframe
    df = PandasTask()

    # implicity call the drop method of the dataframe; drop_task will contain
    # rows 5 through the end
    df_dropped = df.drop(range(0, 5))

    # implicity access the column "x" from the dataframe
    col_x = df.x

    # implicity access column "y" via the loc object of the dataframe. note that loc's custom
    # __getitem__ implementation automatically works
    col_y = df.loc[:, "y"]

    # of course, these tasks can be infinitely chained, implicitly adding new tasks to the DAG
    y_dropped = col_y.drop(range(0, 5)).transpose()
    x_dropped = col_x.drop(range(10, -1))

I am happy to submit a PR to add this functionality — it should be fairly trivial to implement.

Issue Analytics

State:
Created 3 years ago
Comments:12 (6 by maintainers)

Top GitHub Comments

1reaction

julianhesscommented, Jul 24, 2020

This would add enormous complexity - warnings, dict checks, shadows, different behaviors across tasks with different “true” attributes. I already suspect that getitem is too magically implemented already; I don’t think we can introduce such a magic access on getattr as well. I would strongly recommend df.data.xyz as the universal access (possibly for df.data[‘xyz’] as well).

I agree that we should first focus on your suggested .data attribute (since we’d need it as a fallback for shadowed attributes anyway). I suspect there are subtleties I’m not yet seeing with __getattr__, but thus far in my own implementation it seems to be working without any issues. Perhaps rigorously verifying that it has no ill effects could be a “moonshot” goal, and something for a separate issue thread.

possibly for df.data[‘xyz’] as well

I’m not sure I agree with this. I really like the fact that all magic methods behave as if the task object and return value are equivalent, e.g.

@task
def return_num(x):
  return x

with Flow("add_example") as f:
  a = return_num(1)
  b = return_num(2)
  c = a + b # by implicitly creating a new task, Task.__add__ acts on the return values of a and b, not the Task objects they actually are

If as you suggest the return value would now be accessed in task.data (so task.data["xyz"] would replace task["xyz"]), shouldn’t the same logic apply to all the other magics? I think it would be confusing from a UX perspective for __getitem__ to behave differently from everything else. The above code would then become

with Flow("add_example") as f:
  a = return_num(1)
  b = return_num(2)
  c = a.data + b.data

which is a step backwards in simplicity, IMO. The only need for introducing .data to access attributes of a returned object is because the . operator is overloaded to access native Task attributes; AFAIK [] (and all the other operators) do not have similar overloading issues.

Jumping in with one additional consideration - Prefect doesn’t add much value at that level of granularity (e.g., what does it mean to retry an attribute access or to have a stateful dependency on attribute access?). If you want to build deferred computational graphs like this, I highly recommend dask (which has a first-class delayed / distributed dataframe concept).

I only listed dataframes as an example of where this would be useful (and indeed, for that one specific use case, Dask is great!) However, I think this would be handy for any Prefect task returning an object whose attributes are then passed to downstream tasks. For example,

import prefect

class Example:
  def __init__(self, a, b):
    self.a = a
    self.b = b
    self.c = a + b

@prefect.task
def return_object(a, b):
  return Example(a, b)

@prefect.task
def add_three(a, b, c):
  return a + b + c

with prefect.Flow("attr_example") as f:
  foo = return_object(1, 1) # foo.a = 1, foo.b = 1, foo.c = 2
  sum_foo = add_three(foo.a, foo.b, foo.c) # would add 1 + 1 + 2 if __getattr__ were implemented
  sum_const = add_three(1, 2, 3) # adds 1 + 2 + 3

Having first-class support for accessing attributes of returned objects is much cleaner than the current alternative, which would require having to create a special handler class that unwraps objects, e.g.

@prefect.task
def obj_extractor(obj):
  return obj.__dict__

with prefect.Flow("attr_example") as f:
  foo = return_object(1, 1) # foo.a = 1, foo.b = 1, foo.c = 2

  # sum_foo = add_three(foo.a, foo.b, foo.c)
  # ^^^^ won't work; the only easy way to access the attributes of 
  # foo's return value is with an extra wrapper class that converts it to a dict:
  foo_ext = obj_extractor(foo)
  sum_foo = add_three(foo_ext["a"], foo_ext["b"], foo_ext["c"])

  sum_const = add_three(1, 2, 3) # adds 1 + 2 + 3

Having to create such a wrapper class that explicitly unpacks the desired attributes of an object into something accessible by prefect.Task.__getitem__ is cumbersome, and makes the workflow definition un-Pythonic. I’m a huge fan of Prefect because unlike any other workflow definition API I’ve come across, Prefect flow definitions (mostly) read like standard imperative Python, due to the many magic methods already implemented.

0reactions

julianhesscommented, Sep 19, 2020

Fair enough. In my application (which wraps Prefect), I’ll continue to add in this functionality via monkey patching. Thanks for the discussion!

Top Results From Across the Web

Python Magic Methods and __getattr__ | by Santiago Basulto

Magic Methods are a great mechanism to extend the basic features of Python classes and objects and provide more intuitive interfaces. You can ......

The Anatomy of a Prefect Task

Subclassing the Task class Here we have created a Prefect Task named "Add" which receives two inputs (called x and y), and returns...

Python: how to implement __getattr__()? - Stack Overflow

The __ getattr __ method should raise AttributeError instead of KeyError if the attribute is not found. Otherwise getattr(obj,key,val) will not ...

prefectio/prefect - Gitter

Hello prefects! I have been thinking about a use-case that I cannot manage to express with Prefect: I have a task that is...

python/PrefectHQ/prefect/src/prefect/core/task.py - Program Talk

Learn how to use api python/PrefectHQ/prefect/src/prefect/core/task.py. ... can't be implemented as the __eq__() magic method because of Task comparisons.