Programmatic customization of run_id for scheduled DagRuns
See original GitHub issueDescription
Allow DAG authors to control how run_id
’s are generated for created DagRuns. Currently the only way to specify a DagRun’s run_id
is through the manual trigger workflow either through the CLI or API and passing in run_id
. It would be great if DAG authors are able to write a custom logic to generate run_id
’s from scheduled DagRunInterval
’s.
Use case/motivation
In Airflow 1.x, the semantics of execution_date
were burdensome enough for users that DAG authors would subclass DAG to override create_dagrun
so that when new DagRuns were created, they were created with run_id
’s that provided context into semantics about the DagRun. For example,
def create_dagrun(self, **kwargs):
kwargs['run_id'] = kwargs['execution_date'] + self.following_schedule(kwargs['execution_date']).date()
return super().create_dagrun(kwargs)
would result in the UI DagRun dropdown to display the weekday of when the Dag actually ran.
After upgrading to Airflow 2.0 and with Dag serialization in the scheduler overridden methods are no longer there in the SerializedDAG, so we are back to having scheduled__<execution_date>
values in the UI dropdown. It would be great if some functionality could be exposed either through the DAG or just in the UI to display meaningful values in the DagRun dropdown.
Related issues
No response
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project’s Code of Conduct
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:8 (5 by maintainers)
One possible solution would be to allow a DAG argument
run_id_format
for Airflow to use instead, and the user can set this to whatever they want (within some limitations of course). Maybe a Python.format()
syntax or Jinja2.I’m looking at making this a thing that the timetable controls.
(That does mean running timetable code in the scheduler “hot loop” so needs some careful thought)