Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Programmatic customization of run_id for scheduled DagRuns

See original GitHub issue

Description

Allow DAG authors to control how run_id’s are generated for created DagRuns. Currently the only way to specify a DagRun’s run_id is through the manual trigger workflow either through the CLI or API and passing in run_id. It would be great if DAG authors are able to write a custom logic to generate run_id’s from scheduled DagRunInterval’s.

Use case/motivation

In Airflow 1.x, the semantics of execution_date were burdensome enough for users that DAG authors would subclass DAG to override create_dagrun so that when new DagRuns were created, they were created with run_id’s that provided context into semantics about the DagRun. For example,

def create_dagrun(self, **kwargs):
  kwargs['run_id'] = kwargs['execution_date'] + self.following_schedule(kwargs['execution_date']).date()
  return super().create_dagrun(kwargs)

would result in the UI DagRun dropdown to display the weekday of when the Dag actually ran.

After upgrading to Airflow 2.0 and with Dag serialization in the scheduler overridden methods are no longer there in the SerializedDAG, so we are back to having scheduled__<execution_date> values in the UI dropdown. It would be great if some functionality could be exposed either through the DAG or just in the UI to display meaningful values in the DagRun dropdown.

Related issues

No response

Are you willing to submit a PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project’s Code of Conduct

Issue Analytics

State:
Created 2 years ago
Reactions:2
Comments:8 (5 by maintainers)

Top GitHub Comments

2reactions

uranusjrcommented, Apr 11, 2022

One possible solution would be to allow a DAG argument run_id_format for Airflow to use instead, and the user can set this to whatever they want (within some limitations of course). Maybe a Python .format() syntax or Jinja2.

0reactions

ashbcommented, Aug 17, 2022

I’m looking at making this a thing that the timetable controls.

(That does mean running timetable code in the scheduler “hot loop” so needs some careful thought)

Top Results From Across the Web

airflow.models.dag — Airflow Documentation

schedule (ScheduleArg) – Defines the rules according to which DAG runs are scheduled. ... See also Customizing DAG Scheduling with Timetables.

View DAGs, DAG runs, and tasks | Cloud Composer

View a list of DAGs in your environment, including schedule intervals, DAG states, and descriptions. Trigger DAGs. View DAG run details, including successful, ......

Can I programmatically determine if an Airflow DAG was ...

The DAG runs monthly. The DAG generates a report (A SQL query) based on the data of the previous month. If I run...

Getting started with Apache Airflow - Towards Data Science

Airflow is a platform to programmatically author, schedule and monitor ... You can also come up with a custom operator as per your...

Dynamic Workflows On Airflow - LinkedIn

Maybe the next time the DAG runs though you only get one record back and ... We can set Airflow Variables both programmatically...