question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

IntervalSchedule with start_date in the past

See original GitHub issue

Is there a way to trigger backfilling?

I’d like to be able to run a flow against past intervals. I setup this demo flow, but it’s only scheduling runs for the future.

"""Demo flow."""

import datetime
import random
import time

from prefect import Flow, task
from prefect.engine.executors import DaskExecutor
from prefect.schedules import IntervalSchedule


@task
def generate_data():
    data = []
    for _ in range(10):
        time.sleep(random.random())
        data.append(random.random())
    return data


@task
def combine(datasets):
    return zip(datasets)


every_minute = IntervalSchedule(
    # Backfill (start dates in the past) not working as expected.
    start_date=datetime.datetime.utcnow() - datetime.timedelta(weeks=1),
    interval=datetime.timedelta(minutes=1),
)

with Flow("demo", schedule=every_minute) as flow:
    d1 = generate_data()
    d2 = generate_data()
    d3 = generate_data()
    d4 = generate_data()
    result = combine([d1, d2, d3, d4])
    print(result)

executor = DaskExecutor(address="tcp://dask-scheduler:8786")
flow.run(executor=executor)

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:16 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
cicdwcommented, Jun 26, 2019

Awesome, thanks for the added context - that makes sense!

In that case, in order to run your flow on historical data I’d recommend:

  1. inferring the “historical time” from Prefect Context - if you rely on one of those date references for the analysis, then it will be updated with each new run
  2. in order to “backfill” in this scenario, you can loop over the various dates / times you want to analyze and override the relevant context values with the context keyword argument to flow.run (user provided values will override the defaults)

Alternatively, depending on whether you truly need this to run on a proper schedule, you could introduce a Parameter that represents the experiment (or batch of experiments) you wish to analyze, and kick the flow off manually with various values of this Parameter.

1reaction
cicdwcommented, Sep 23, 2020

Hi @snenkov for example:

Using context:

import pendulum
import prefect

@prefect.task
def do_something_time_specific():
    current_time = prefect.context.get("scheduled_start_time")
    if isinstance(current_time, str):
        current_time = pendulum.parse(current_time)
    # does something dealing with time

flow = Flow("backfill", tasks=[do_something_time_specific])

flow.run() # will use current timestamp
flow.run(context={"scheduled_start_time": "1986-01-02"}) # will use old timestamp

Using a Parameter:

import pendulum
import prefect

current_time = Parameter("timestamp", default=None)

@prefect.task
def do_something_time_specific(current_time):
    if current_time is None:
        current_time = prefect.context.get("scheduled_start_time")
    if isinstance(current_time, str):
        current_time = pendulum.parse(current_time)
    # does something dealing with time

with Flow("backfill") as flow:
    do_something_time_specific(current_time)

flow.run() # will use current timestamp
flow.run(current_time="1986-01-02") # will use old timestamp

There are analogous things you can do if running against a Prefect backend (as both context and Parameters can be provided to each run), and also for runs that have schedules.

Also note that this PR will make Parameter typing more natural for datetime objects.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Airflow Schedule Interval 101 - Towards Data Science
execution_date is the start date and time when you expect a DAG to be ... After backfilling all the previous executions, you probably...
Read more >
What happens when the start_date for a DAG is in the past?
for a scheduled run, airflow scheduler waits for the completion of interval time period before running your DAG. for instance, say you want ......
Read more >
IntervalSchedule with start_date=now starts at now+interval
Hi @BENR0, that behavior is indeed expected -- Prefect schedules will never create runs in the past. From Prefect's point of view, it...
Read more >
system.tag.queryTagHistory - Ignition User Manual 7.8
The span of the query can be specified using startDate and endDate. ... only "rangeHours=-8" to get the last 8 hours from the...
Read more >
Airflow start date concepts - Astronomer Forum
I find the concept of start date as little confusing so created a doc for my team ... Hoover over on ( i...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found