question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Queue support for DaskExecutor using Dask Worker Resources

See original GitHub issue

Currently airflow’s DaskExecutor does not support specifying queues for tasks, due to dask’s lack of an explicit queue specification feature. However, this can be reliably mimicked using dask resources (details here). So the set up would look something like this:

# starting dask worker that can service airflow tasks submitted with queue=queue_name_1 or queue_name_2
$ dask-worker <address> --resources "queue_name_1=inf, queue_name_2=inf"

(Unfortunately as far as I know you need to provide a finite resource limit for the workers, so you’d need to provide an arbitrarily large limit, but I think it’s worth the minor inconvenience to allow a queue functionality in the dask executor.)

# airflow/executors/dask_executor.py
def execute_async(
    self,
    key: TaskInstanceKey,
    command: CommandType,
    queue: Optional[str] = None,
    executor_config: Optional[Any] = None,
) -> None:

    self.validate_command(command)

    def airflow_run():
        return subprocess.check_call(command, close_fds=True)

    if not self.client:
        raise AirflowException(NOT_STARTED_MESSAGE)

    ################ change made here #################
    resources = None
    if queue:
        resources = {queue: 1}

    future = self.client.submit(airflow_run, pure=False, resources=resources)
    self.futures[future] = key  # type: ignore

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
aa1371commented, Jul 6, 2021

@fjetter thanks for the insight. I think worker resources is the best way forward. Since it lets you tag your workers at creation time and then dispatch your airflow tasks based on that tagged name (i.e. queue name), without needing to keep track of explicit workers within airflow. Also it turns out there is a way to define infinite worker resources in dask workers (https://github.com/dask/distributed/discussions/5010#discussioncomment-971219), so this will let you define the resource on the worker without having to provide an arbitrarily large limit, or having to worry about how many tasks could possibly run concurrently on your worker.

0reactions
fjettercommented, Jul 6, 2021

I’m not familiar enough with the queue functionality of airflow to know what the expected behaviour should be. In dask we have broadly speaking two ti three mechanism to limit concurrency on task level and/or control assignments to workers.

If you want to limit the number of assigned tasks, i.e. want to ensure that tasks are not yet assigned to a worker before it is allowed to be executed, resources are the way to go.

If you want to control which workers are allowed to work on a given task, the workers keyword might be a better fit but that doesn’t control concurrency (other than the intrinsic limit a single worker exposes)

If you want to ensure that only a limited number of tasks is executed but it is fine for them to be assigned to a worker and may even block a worker, we have a Semaphore which could be used.

Which is the best to pick depends on how queuing in airflow is supposed to work

Read more comments on GitHub >

github_iconTop Results From Across the Web

[airflow] branch main updated: Queue support for DaskExecutor ...
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch main in repository ...
Read more >
Dask Executor — Airflow Documentation
The DaskExecutor implements queues using Dask Worker Resources functionality. To enable the use of queues, start your Dask workers with resources of the ......
Read more >
Dask cluster worker specs based on task resource requirements
The DaskExecutor here will reach out to AWS and create a FargateCluster. ... I see that SpecCluster can support several worker specs through...
Read more >
Worker Resources — Dask.distributed 2022.12.1 documentation
In this case we want to balance tasks across the cluster with these resource constraints in mind, allocating GPU-constrained tasks to GPU-enabled workers....
Read more >
How this works - Dask-Jobqueue
Jobs are resources submitted to, and managed by, the job queueing system (e.g. PBS, SGE, etc.). In dask-jobqueue, a single Job may include...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found