IntervalSchedule with start_date in the past
See original GitHub issueIs there a way to trigger backfilling?
I’d like to be able to run a flow against past intervals. I setup this demo flow, but it’s only scheduling runs for the future.
"""Demo flow."""
import datetime
import random
import time
from prefect import Flow, task
from prefect.engine.executors import DaskExecutor
from prefect.schedules import IntervalSchedule
@task
def generate_data():
data = []
for _ in range(10):
time.sleep(random.random())
data.append(random.random())
return data
@task
def combine(datasets):
return zip(datasets)
every_minute = IntervalSchedule(
# Backfill (start dates in the past) not working as expected.
start_date=datetime.datetime.utcnow() - datetime.timedelta(weeks=1),
interval=datetime.timedelta(minutes=1),
)
with Flow("demo", schedule=every_minute) as flow:
d1 = generate_data()
d2 = generate_data()
d3 = generate_data()
d4 = generate_data()
result = combine([d1, d2, d3, d4])
print(result)
executor = DaskExecutor(address="tcp://dask-scheduler:8786")
flow.run(executor=executor)
Issue Analytics
- State:
- Created 4 years ago
- Comments:16 (5 by maintainers)
Top Results From Across the Web
Airflow Schedule Interval 101 - Towards Data Science
execution_date is the start date and time when you expect a DAG to be ... After backfilling all the previous executions, you probably...
Read more >What happens when the start_date for a DAG is in the past?
for a scheduled run, airflow scheduler waits for the completion of interval time period before running your DAG. for instance, say you want ......
Read more >IntervalSchedule with start_date=now starts at now+interval
Hi @BENR0, that behavior is indeed expected -- Prefect schedules will never create runs in the past. From Prefect's point of view, it...
Read more >system.tag.queryTagHistory - Ignition User Manual 7.8
The span of the query can be specified using startDate and endDate. ... only "rangeHours=-8" to get the last 8 hours from the...
Read more >Airflow start date concepts - Astronomer Forum
I find the concept of start date as little confusing so created a doc for my team ... Hoover over on ( i...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Awesome, thanks for the added context - that makes sense!
In that case, in order to run your flow on historical data I’d recommend:
context
keyword argument toflow.run
(user provided values will override the defaults)Alternatively, depending on whether you truly need this to run on a proper schedule, you could introduce a
Parameter
that represents the experiment (or batch of experiments) you wish to analyze, and kick the flow off manually with various values of thisParameter
.Hi @snenkov for example:
Using context:
Using a Parameter:
There are analogous things you can do if running against a Prefect backend (as both context and Parameters can be provided to each run), and also for runs that have schedules.
Also note that this PR will make Parameter typing more natural for datetime objects.