Unconditional gather of operands causes unexpected exception.
See original GitHub issueDescription
This may sound a little ridiculous - but bear with me for a minute. It seems that there is no way for a flow to reference an object which starts out serializable, but later becomes unserializable. I think this is because all task results are eventually gathered and returned by Flow.run()
. This causes an exception to be raised, even if the mutated object is never referenced in the flow after mutation. It seems to happen regardless of how the object entered the flow, whether constructed in flow context, in a task, or passed in as a Parameter. In Prefect 0.7.1 this does not generate an exception of the object is constructed from within the flow context. In this case the Flow contains a task: <Task: Constant[Mutator]>
. Running with 0.9.2 there is no corresponding task, which may violate the goal of avoiding implicit dependencies?
This could be a crazy edge-case, or an unanticipated use case, but it does have some interesting implications even for ordinary cases. For example the default behavior is to gather all operands to the submitting machine which I’ve found to be really convenient for debug, until it crashes the node which submitted the job in the first place!
Expected Behavior
Don’t gather the object after it has been mutated?
Reproduction
#!/usr/bin/env python
import sys, os
# Flow specific
from prefect import task, Flow
from prefect.engine.executors import DaskExecutor
from site_specific import get_dask_client
def start_executor():
client = get_dask_client()
executor = DaskExecutor(client.scheduler_info()['address'], debug=True)
return executor
def gen_gen():
yield 20
yield 30
class Mutator(object):
def __init__(self):
self.value = 1
def mutate(self):
self.generator = gen_gen()
@task
def mutate(mutated):
mutated.mutate()
return True
@task
def make_mutator():
return Mutator()
def main():
with Flow("mutate_flow") as f:
# This also raises the exception
# mutator = Mutator()
mutator = make_mutator()
mutated = mutate(mutator)
f.run(executor=start_executor())
if __name__ == "__main__":
sys.exit(main())
This gives us the following stack trace:
[2020-02-21 14:18:24,410] INFO - prefect.FlowRunner | Starting flow run.
distributed.protocol.core - CRITICAL - Failed to deserialize
Traceback (most recent call last):
File "lib/python3.7/site-packages/distributed/protocol/core.py", line 124, in loads
value = _deserialize(head, fs, deserializers=deserializers)
File "lib/python3.7/site-packages/distributed/protocol/serialize.py", line 268, in deserialize
return loads(header, frames
File "lib/python3.7/site-packages/distributed/protocol/serialize.py", line 80, in serialization_error_loads
raise TypeError(msg)
TypeError: Could not serialize object of type Success
Traceback (most recent call last)
File "lib/python3.7/site-packages/distributed/protocol/pickle.py", line 38, in dumps
result = pickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL
_pickle.PicklingError: Can't pickle <class '__main__.Mutator'>: attribute lookup Mutator on __main__ failed
During handling of the above exception, another exception occurred
Traceback (most recent call last):
File "lib/python3.7/site-packages/distributed/protocol/serialize.py", line 191, in serialize
header, frames = dumps(x, context=context) if wants_context else dumps(x)
File "lib/python3.7/site-packages/distributed/protocol/serialize.py", line 58, in pickle_dumps
return {"serializer": "pickle"}, [pickle.dumps(x
File "lib/python3.7/site-packages/distributed/protocol/pickle.py", line 51, in dumps
return cloudpickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL)
File "lib/python3.7/site-packages/cloudpickle/cloudpickle.py", line 1125, in dumps
cp.dump(obj)
File "lib/python3.7/site-packages/cloudpickle/cloudpickle.py", line 482, in dump
return Pickler.dump(self, obj
File "lib/python3.7/pickle.py", line 437, in dump
self.save(obj)
File "lib/python3.7/pickle.py", line 549, in save
self.save_reduce(obj=obj, *rv)
File "lib/python3.7/pickle.py", line 662, in save_reduce
save(state
File "lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "lib/python3.7/pickle.py", line 859, in save_dict
self._batch_setitems(obj.items(
File "lib/python3.7/pickle.py", line 885, in _batch_setitems
save(v
File "lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "lib/python3.7/pickle.py", line 859, in save_dict
self._batch_setitems(obj.items(
File "lib/python3.7/pickle.py", line 890, in _batch_setitems
save(v
File "lib/python3.7/pickle.py", line 549, in save
self.save_reduce(obj=obj, *rv)
File "lib/python3.7/pickle.py", line 662, in save_reduce
save(state)
File "lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "lib/python3.7/pickle.py", line 859, in save_dict
self._batch_setitems(obj.items()
File "lib/python3.7/pickle.py", line 885, in _batch_setitems
save(v)
File "lib/python3.7/pickle.py", line 549, in save
self.save_reduce(obj=obj, *rv
File "lib/python3.7/pickle.py", line 662, in save_reduce
save(state)
File "lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "lib/python3.7/pickle.py", line 859, in save_dict
self._batch_setitems(obj.items
File "lib/python3.7/pickle.py", line 885, in _batch_setitems
save(v)
File "lib/python3.7/pickle.py", line 524, in save
rv = reduce(self.proto)
TypeError: can't pickle generator objects
Environment
I’m using prefect 0.9.1 and distributed 2.10.
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (2 by maintainers)
That sounds like a good solution to me. Thanks! Feel free to close this as not-a-bug or whatever makes sense to you!
I’ll close, but feel free to comment back here if you run into further issues!