question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Programmatic customization of run_id for scheduled DagRuns

See original GitHub issue

Description

Allow DAG authors to control how run_id’s are generated for created DagRuns. Currently the only way to specify a DagRun’s run_id is through the manual trigger workflow either through the CLI or API and passing in run_id. It would be great if DAG authors are able to write a custom logic to generate run_id’s from scheduled DagRunInterval’s.

Use case/motivation

In Airflow 1.x, the semantics of execution_date were burdensome enough for users that DAG authors would subclass DAG to override create_dagrun so that when new DagRuns were created, they were created with run_id’s that provided context into semantics about the DagRun. For example,

def create_dagrun(self, **kwargs):
  kwargs['run_id'] = kwargs['execution_date'] + self.following_schedule(kwargs['execution_date']).date()
  return super().create_dagrun(kwargs)

would result in the UI DagRun dropdown to display the weekday of when the Dag actually ran. image001

After upgrading to Airflow 2.0 and with Dag serialization in the scheduler overridden methods are no longer there in the SerializedDAG, so we are back to having scheduled__<execution_date> values in the UI dropdown. It would be great if some functionality could be exposed either through the DAG or just in the UI to display meaningful values in the DagRun dropdown.

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:2
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
uranusjrcommented, Apr 11, 2022

One possible solution would be to allow a DAG argument run_id_format for Airflow to use instead, and the user can set this to whatever they want (within some limitations of course). Maybe a Python .format() syntax or Jinja2.

0reactions
ashbcommented, Aug 17, 2022

I’m looking at making this a thing that the timetable controls.

(That does mean running timetable code in the scheduler “hot loop” so needs some careful thought)

Read more comments on GitHub >

github_iconTop Results From Across the Web

airflow.models.dag — Airflow Documentation
schedule (ScheduleArg) – Defines the rules according to which DAG runs are scheduled. ... See also Customizing DAG Scheduling with Timetables.
Read more >
View DAGs, DAG runs, and tasks | Cloud Composer
View a list of DAGs in your environment, including schedule intervals, DAG states, and descriptions. Trigger DAGs. View DAG run details, including successful, ......
Read more >
Can I programmatically determine if an Airflow DAG was ...
The DAG runs monthly. The DAG generates a report (A SQL query) based on the data of the previous month. If I run...
Read more >
Getting started with Apache Airflow - Towards Data Science
Airflow is a platform to programmatically author, schedule and monitor ... You can also come up with a custom operator as per your...
Read more >
Dynamic Workflows On Airflow - LinkedIn
Maybe the next time the DAG runs though you only get one record back and ... We can set Airflow Variables both programmatically...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found