Allow multiple pools for one task
See original GitHub issueHello!
Description of feature:
I think it would be helpful to allow for multiple pools in one task. Currently, the pool
argument for any class inheriting from BaseOperator
is of type string
, and thus only allows one pool to be entered. I believe it would be useful to allow to set multiple pools for one task, meaning allowing the argument pool
to be a list
of string
instead of just one string
. This would mean that a task would have to wait on a spot to be available in every one of the pools it declares, instead of in the only one pool it declares, and this would mean that a task would take up spots in every one of the pools it declares, instead of in only the one pool it declares.
Use case:
I have some tasks that require multiple resources. I cannot split the tasks into separate tasks each requiring one resource, since the tasks need the two (or more) resources at once to complete their assignment. I also have some tasks only requiring one of the resources, so I can’t create a pool for both resources. Example: Task 1 requires resource A and B Task 2 requires resource A Task 3 requires resource B Resource A can only have 4 connections. Resource B can only have 16 connections. I would need to have task 1 be in pool A and pool B, and this is not possible today since I can only specify one pool.
What would I want to happen?
Allow multiple pools in task creation. I looked into airflow source code, and it looks like the assumption that we only have one pool is deep into SQL, so I cannot just easily fork airflow and add this feature, so the change is not small and I do not have enough airflow understanding to make this change.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:17
- Comments:9 (5 by maintainers)
Top GitHub Comments
To motivate this a little bit further, the following use-case would also be solved with this PR:
When we use the KubernetesPodOperator, we launch pods in Namespaces. These Namespaces have limits - however Airflow is currently unaware of those. Thus, if we hit the limits, Airflow will just continue to schedule tasks which will fail immediately. Thus we should put each tasks in two pools: One representing the memory limit and one representing the CPU limit. This really would be an essential feature for larger Kubernetes deployments.
Rather detailed - look at the other AIPs (completed) - they are much better explanation of the level of detail that I could give here.
Just to set expectation - this is how things work in Open Source. Things get implemented, when someone implements them. If you want something implemented, you either do it, or find someone who will get an interest and implement it. This project is done in the community and run by the Apache Software Foundation rules - where anyone can contribute.