question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Task output does not seem to be cached (despite `cache_for` argument being present)

See original GitHub issue

(I’m using prefect version 0.5.5.)

Here is a simple flow containing a single task that has the cache_for argument set.

I would expect this flow to take ~2s the first time it runs and then pretty much complete instantaneously on subsequent runs.

from prefect import task, Flow
from datetime import timedelta
import time

@task(cache_for=timedelta(hours=1))
def return_the_answer():
    time.sleep(2)
    return 42

with Flow("answer") as flow:
    return_the_answer()

flow.run()

Here is the output from running in a Jupyter notebook using %%time, which confirms that the flow takes ~2s to complete.

[2019-07-10 20:45:09,227] INFO - prefect.FlowRunner | Beginning Flow run for 'answer'
[2019-07-10 20:45:09,229] INFO - prefect.FlowRunner | Starting flow run.
[2019-07-10 20:45:09,235] INFO - prefect.TaskRunner | Task 'return_the_answer': Starting task run...
[2019-07-10 20:45:11,240] INFO - prefect.TaskRunner | Task 'return_the_answer': finished task run for task with final state: 'Cached'
[2019-07-10 20:45:11,241] INFO - prefect.FlowRunner | Flow run SUCCESS: all reference tasks succeeded
CPU times: user 14.6 ms, sys: 4.24 ms, total: 18.8 ms
Wall time: 2.02 s
<Success: "All reference tasks succeeded.">

The strange thing is that subsequent runs take exactly the same amount of time but I’d expect the result to be cached so they should complete much more quickly.

It’s quite possible that this is a misunderstanding on my part rather than a bug. Am I supplying the cache_for argument in the wrong place? (I couldn’t find an example in the docs which applies it to tasks defined via the @task decorator.) Any clarifications would be appreciated. Many thanks!

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
maxalbertcommented, Jul 11, 2019

That’s great news! 😃

I just tried it with your branch and can confirm it works as expected.

Kudos once again for the quick response and the solution, much appreciated!

0reactions
cicdwcommented, Jul 11, 2019

@maxalbert I’ve implemented a version of this in #1226 - assuming others approve, once it’s merged you should be good to go! Note the new cache_key attribute on Tasks as well, which allows you to share caches amongst different tasks and even different flows.

Thanks for the issue, this was surprisingly easy to implement and I think will provide a large lift for other users as well!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Build Cache - Gradle User Manual
The Gradle build cache is a cache mechanism that aims to save time by reusing outputs produced by other builds. The build cache...
Read more >
Does Prefect support caching on a flow level rather than only ...
Hi there, We have been using Prefect (0.14.x) for data intensive processing ... Prefect won't run the task at all - it will...
Read more >
c# - When to cache Tasks? - Stack Overflow
The question is asking for a comparison of the differences between caching a Task , versus caching the results of the asynchronous operation....
Read more >
Caching general build artifacts between stages - GitLab.org
Not sure if this is the right way or if introduces some other problems that I might not be aware. But it seems...
Read more >
Using caching in Shiny to maximize performance
However, a cached reactive would be able to just get Boston data from the cache. Click to see code for an example application ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found