Task output does not seem to be cached (despite `cache_for` argument being present)
See original GitHub issue(I’m using prefect version 0.5.5
.)
Here is a simple flow containing a single task that has the cache_for
argument set.
I would expect this flow to take ~2s the first time it runs and then pretty much complete instantaneously on subsequent runs.
from prefect import task, Flow
from datetime import timedelta
import time
@task(cache_for=timedelta(hours=1))
def return_the_answer():
time.sleep(2)
return 42
with Flow("answer") as flow:
return_the_answer()
flow.run()
Here is the output from running in a Jupyter notebook using %%time
, which confirms that the flow takes ~2s to complete.
[2019-07-10 20:45:09,227] INFO - prefect.FlowRunner | Beginning Flow run for 'answer'
[2019-07-10 20:45:09,229] INFO - prefect.FlowRunner | Starting flow run.
[2019-07-10 20:45:09,235] INFO - prefect.TaskRunner | Task 'return_the_answer': Starting task run...
[2019-07-10 20:45:11,240] INFO - prefect.TaskRunner | Task 'return_the_answer': finished task run for task with final state: 'Cached'
[2019-07-10 20:45:11,241] INFO - prefect.FlowRunner | Flow run SUCCESS: all reference tasks succeeded
CPU times: user 14.6 ms, sys: 4.24 ms, total: 18.8 ms
Wall time: 2.02 s
<Success: "All reference tasks succeeded.">
The strange thing is that subsequent runs take exactly the same amount of time but I’d expect the result to be cached so they should complete much more quickly.
It’s quite possible that this is a misunderstanding on my part rather than a bug. Am I supplying the cache_for
argument in the wrong place? (I couldn’t find an example in the docs which applies it to tasks defined via the @task
decorator.) Any clarifications would be appreciated. Many thanks!
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (3 by maintainers)
That’s great news! 😃
I just tried it with your branch and can confirm it works as expected.
Kudos once again for the quick response and the solution, much appreciated!
@maxalbert I’ve implemented a version of this in #1226 - assuming others approve, once it’s merged you should be good to go! Note the new
cache_key
attribute on Tasks as well, which allows you to share caches amongst different tasks and even different flows.Thanks for the issue, this was surprisingly easy to implement and I think will provide a large lift for other users as well!