Names for expanded tasks
See original GitHub issueDescription
Airflow currently exposes map_index to the user as a way of distinguishing between tasks in an expansion. The index is unlikely to be meaningful to the user. They probably have their own label for this action. I’m requesting that we allow them to add that label.
To see the problem, consider a dag that sends email to a list of users which is generated at runtime:
with DAG(...) as dag:
@dag.task
def get_account_status():
return [
{
"NAME": "Wintermute",
"EMAIL": "wintermute@tessier-ashpool.com",
"STATUS": "active",
},
{
"NAME": "Hojo",
"EMAIL": "ops@research.shinra.com",
"STATUS": "delinquent",
},
]
BashOperator.partial(
task_id="send_email",
bash_command=dedent(
"""
cat <<- EOF | tee | mailx -s "your account" $EMAIL
Dear $NAME,
Your account status is $STATUS.
EOF
"""
),
).expand(env=get_account_status())
Notice that in the grid view, it’s not obvious which task goes with which user:
Use case/motivation
I’d like to be able to explicitly assign a name to each expanded task, that way I can later go look at the right one. I would like this name to be used (when available) anywhere that the user interacts with the expanded task.
In cases where the user provides no names, perhaps we can generate some. For instance, this expansion generates four instances.
BashOperator.partial(task_id="greet").expand(
bash_command=["echo hello $USER", "echo goodbye $USER"],
env=[{"USER": "foo"}, {"USER": "bar"}],
)
The friendliest way would be to use the requested feature name each task:
hi_foohi_barbye_foobye_bar
As it is, the user will see:
1234
But if the user doesn’t give names, maybe we should generate some names for them:
bash_command_1_env_1bash_command_1_env_2bash_command_2_env_1bash_command_2_env_2
I don’t know. I’m creating this issue so we have a place to discuss it.
Related issues
No response
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project’s Code of Conduct
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:18 (18 by maintainers)

Top Related StackOverflow Question
It just occurred to me that this is essentially a part of #22073. What we (users) actually want is a more customisable way to identify things (in this instance, a mapped task instance), and if we look past the assumption that a mapped task instance is “task_id + map_index”, we simply need a better way for the user to tell “what is this thing” in the Airflow UI. So let’s keep track of that issue instead to make sure whatever solution we come up for it correctly considers map_index.
I think you can forget about this.
You’ve just hit reality train (or rather reality train hit you 😃 )
Look there: https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-42+Dynamic+Task+Mapping - see this part:
“Rather than overloading the task_id argument to
airflow tasks run(i.e. having a task_id ofrun_after_loop[0]) we will add a new--mapping-idargument toairflow tasks run– this value will be a ~JSON-encoded~ an integer specifying the index/position of the mapping.” (see also comments in the doc).We have to support MySQL and the problem with MySQL is that index key size is limited. VERY limited. Depending on the type of encooding it might be even 760 characters or s. And task-id + dag_id + (string) task_index already exceed the limit by far. And there is no way around it - and this was the main reason (I believe) we had to use integer, even if originally we planned not even a name but JSON-encoded list of parameters - very similar to what you proposed ( which was far better for uniqueness - because it was automated).
But this is just what I saw - by observing it being implemented, so I might be wrong on that account - if that was the only or main reason for changing the original decision.