question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Airflow schedules tasks in wrong timezone with an MSSQL Metadata DB on a non-UTC server

See original GitHub issue

Apache Airflow version

2.2.2

What happened

Airflow schedules a task an hour earlier than expected, when using an MSSQL metadata database where the DB server is set to the CET timezone. The screenshot below shows the DAG starting an hour before the end of the data interval.

image

What you expected to happen

Airflow schedules the task at the correct time in UTC.

How to reproduce

It’s hard to describe a complete reproducible method since it relies on having an MSSQL Server with particular settings.

A relevant DAG would be a simple as:

with DAG(
    dag_id="example_dag",
    start_date=datetime(2021, 1, 1),
    schedule_interval="0 9 * * 1-5",
) as dag:
    task = DummyOperator(task_id="dummy")

And Airflow config of:

default_timezone = utc

This DAG would then be scheduled an hour earlier than expected.

Operating System

Redhat UBI 8

Versions of Apache Airflow Providers

No response

Deployment

Other Docker-based deployment

Deployment details

Airflow scheduler and webserver each running in a docker container based on Redhat UBI 8. Metadata DB is MSSQL Server running on a Windows Server where the server timezone is CET.

Anything else

In our installation, the problem is happening for any DAG with a UTC based schedule.

I believe the root cause is this line of code: https://github.com/apache/airflow/blob/6405d8f804e7cbd1748aa7eed65f2bbf0fcf022e/airflow/models/dag.py#L2872

On MSSQL, func.now() appears to correspond to GETDATE(), which returns the current time in the timezone of the DB server. But next_dagrun_create_after is stored in the database as UTC (in a datetime2 column, which doesn’t include timezone information). So this line of code is equivalent to “Is the current time in CET before the next creation time in UTC?”, meaning that a DAG that should start at 09:00 UTC starts at 09:00 CET instead, one hour early.

I can verify that func.now() returns CET with the SQLAlchemy code engine.execute(sa.select([sa.func.now()])).fetchall().

I think the correct way to get the current time in UTC on MSSQL is GETUTCDATE().

We ran Airflow 1.10 previously without seeing this problem. From what I can tell, in that version the date comparison is done on the application side rather than in the DB.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:2
  • Comments:17 (15 by maintainers)

github_iconTop GitHub Comments

2reactions
MansiVerma777commented, Jun 14, 2022

Hi, We are also encountering the same issue with airflow. We are using SQLServer backend which is not running on a UTC timezone. Since the db query that creates dagruns that need to be scheduled uses CURRENT TIMESTAMP, we are seeing scheduling lag of 7 hours since database runs in timezone which is 7 hours behind UTC. Any ETA for when the fix will be available?

2reactions
potiukcommented, Jan 27, 2022

I think db-specific case will be better (if simple). We already have ~500 deps in Airflow total (including transitive) and while adding one more seems like no-biggie, adding a ‘util’ in Airlfow seems to be more “straightforward”.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Time Zones — Airflow Documentation
Airflow stores datetime information in UTC internally and in the database. It allows you to run your DAGs with time zone dependent schedules....
Read more >
Understanding the Airflow metadata database
The metadata database is a core component of Airflow. It stores crucial information ... Data about DAG and task runs which are generated...
Read more >
How to trigger daily DAG run at midnight local time instead of ...
Schedule interval can also be a "cron expression" which means you can easily run it at 20:00 UTC. That coupled with "user_defined_filters" ...
Read more >
Solve Time Zone, GMT, and UTC problems using the T-SQL ...
TSqlToolbox handles the hard SQL Server date time, timezone, ... has rows that show historical metadata changes for each time zone change, ...
Read more >
Release Notes - Apache Airflow documentation - Amazon AWS
Retry on Airflow Schedule DAG Run DB Deadlock (#26347) ... Task log templates are now read from the metadata database instead of airflow.cfg...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found