question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Clarify how schedule_interval works

See original GitHub issue

Hi,

This is a dummy example that consists of 4 tasks, back to back, all attached to the same DAG events_redshift. I’ve set schedule_interval to 1 for now, as I am trying to see this executed, but that’s not a real life example. This is running the CeleryExecutor and Postgresql.

"""
Extracts events from S3 and loads them into Redshift.
"""

from airflow import DAG 
from airflow.operators import DummyOperator
from datetime import datetime
from datetime import timedelta


default_args = { 
    'owner': 'airflow',
    'start_date': datetime(2015, 8, 5, 8, 4), 
    'schedule_interval': timedelta(minutes=1),
    'retry_delay': timedelta(minutes=1),
}

dag = DAG('events_redshift', default_args=default_args)

t_download_from_s3 = DummyOperator(
    task_id='download_from_s3',
    dag=dag,
)

t_cleanup = DummyOperator(
    task_id='cleanup',
    dag=dag,
)

t_upload_to_s3 = DummyOperator(
    task_id='upload_to_s3',
    dag=dag,
)

t_load_to_redshift = DummyOperator(
    task_id='load_to_redshift',
    dag=dag,
)

t_cleanup.set_upstream(t_download_from_s3)
t_upload_to_s3.set_upstream(t_cleanup)
t_load_to_redshift.set_upstream(t_upload_to_s3)

I can see the DAG on the web UI, however the only way to get it to execute the tasks is by clicking on it and Run manually, as you can see with download_from_s3.

87ba

This is the celery worker: 73dc

And the scheduler’s output, refreshing every 5 seconds.

6a36

My expectations are that this should be running every minute, and each task should be executed back to back, however none of this is happening.

So I guess my question is: do I have the wrong expectation, and what am I doing wrong?

Thanks a lot for your help!

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:15 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
mistercrunchcommented, Aug 7, 2015

Oh. I should have caught this earlier but the issue is your DAG is actually a daily dag since at the moment schedule_interval is based on the argument you pass to the DAG object as in

dag = DAG("my_dag_id", schedule_interval=timedelta(hours=1))

I need to clarify that in the docs / API.

0reactions
mistercrunchcommented, Aug 7, 2015

Updated the tutorials / docs here: https://github.com/airbnb/airflow/pull/238

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to work correctly airflow schedule_interval - Stack Overflow
The best practice is to have the start_date rounded to your DAG's schedule_interval. schedule_interval (datetime. timedelta or dateutil.
Read more >
The Ultimate Guide on Airflow Scheduler - Learn - Hevo Data
Each of your DAG runs has a “schedule_interval” or repeat frequency that can be defined using a cron expression as an “str”, or...
Read more >
Airflow Schedule Interval 101 - Towards Data Science
It arranges the monitoring with some intervals, which is a configurable setting called scheduler_heartbeat_sec , it is suggested you provide a ...
Read more >
Scheduling - An Introduction to Apache Airflow - Educative.io
This lesson clarifies the working of schedule_interval and start_date, which can be confusing for complex crontab expressions.
Read more >
Scheduling & Triggers - Apache Airflow
Note that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found