Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Allow multiple pools for one task

See original GitHub issue

Hello!

Description of feature:

I think it would be helpful to allow for multiple pools in one task. Currently, the pool argument for any class inheriting from BaseOperator is of type string, and thus only allows one pool to be entered. I believe it would be useful to allow to set multiple pools for one task, meaning allowing the argument pool to be a list of string instead of just one string. This would mean that a task would have to wait on a spot to be available in every one of the pools it declares, instead of in the only one pool it declares, and this would mean that a task would take up spots in every one of the pools it declares, instead of in only the one pool it declares.

Use case:

I have some tasks that require multiple resources. I cannot split the tasks into separate tasks each requiring one resource, since the tasks need the two (or more) resources at once to complete their assignment. I also have some tasks only requiring one of the resources, so I can’t create a pool for both resources. Example: Task 1 requires resource A and B Task 2 requires resource A Task 3 requires resource B Resource A can only have 4 connections. Resource B can only have 16 connections. I would need to have task 1 be in pool A and pool B, and this is not possible today since I can only specify one pool.

What would I want to happen?

Allow multiple pools in task creation. I looked into airflow source code, and it looks like the assumption that we only have one pool is deep into SQL, so I cannot just easily fork airflow and add this feature, so the change is not small and I do not have enough airflow understanding to make this change.

Issue Analytics

State:
Created 3 years ago
Reactions:17
Comments:9 (5 by maintainers)

Top GitHub Comments

7reactions

c-thielcommented, Jul 5, 2021

To motivate this a little bit further, the following use-case would also be solved with this PR:

When we use the KubernetesPodOperator, we launch pods in Namespaces. These Namespaces have limits - however Airflow is currently unaware of those. Thus, if we hit the limits, Airflow will just continue to schedule tasks which will fail immediately. Thus we should put each tasks in two pools: One representing the memory limit and one representing the CPU limit. This really would be an essential feature for larger Kubernetes deployments.

1reaction

potiukcommented, Dec 7, 2022

I would consider drafting an AIP but I don’t have the technical knowledge of Airflow’s architecture to propose or PoC an implementation. How much detail is expected from an AIP?

Rather detailed - look at the other AIPs (completed) - they are much better explanation of the level of detail that I could give here.

Just to set expectation - this is how things work in Open Source. Things get implemented, when someone implements them. If you want something implemented, you either do it, or find someone who will get an interest and implement it. This project is done in the community and run by the Apache Software Foundation rules - where anyone can contribute.