question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Windows support for Airflow

See original GitHub issue

Description

Currently, the airflow project uses PEP-3143 style daemons to launch tasks (as implemented in https://pypi.org/project/python-daemon/), however this is targeted towards unix daemons. As a result, running airflow on windows requires multiple levels of abstraction each with their own problems. Would it be possible to use something like daemoniker (https://daemoniker.readthedocs.io/en/latest/) to launch tasks? What are the challenges and issues?

In machine learning workflows, with large datasets, it is a huge time-saver if the pipeline tasks can be run on the GPU. WSL 1 does not support GPU passthrough, docker through WSL 2 supports GPU passthrough only with the Insiders build, additionally it has issues with networking when connected to VPN (https://github.com/microsoft/WSL/issues/5068).

Use case / motivation

Natively running airflow without WSL 1/2 or docker on Windows. This is helpful in cases where the company ecosystem is windows-based.

Possible implementation

The daemon module is only used to daemonize the scheduler and webserver. Here’s a sample code that runs the scheduler (airflow origin/v1-10-stable) using daemoniker, comments are welcome:

# airflow/bin/cli.py
from daemoniker import Daemonizer

...

if args.daemon:
    with Daemonizer() as (is_setup, daemonizer):
        if is_setup:
            pid, stdout, stderr, log_file = setup_locations("scheduler",
                                                    args.pid,
                                                    args.stdout,
                                                    args.stderr,
                                                    args.log_file)
        
        _is_parent = daemonizer(
            pid,
            stdout_goto=stdout,
            stderr_goto=stderr
        )

    job.run()

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:4
  • Comments:22 (17 by maintainers)

github_iconTop GitHub Comments

4reactions
potiukcommented, Aug 19, 2020

I think it would be great if someone could invest in Windows support. I believe there are few things - not only the daemon model but also Local Executor uses fork mechanisms which won’t be able on Windows, also there might be some problem if you want to use Celery Executor on Windows: https://www.distributedpython.com/2018/08/21/celery-4-windows/ There are few POSIX-compliant packages used as well with might not work on Windows. And automated testing might be a problem since we are using Docker. It looks like quite a big effort to invest…

1reaction
casra-developerscommented, May 26, 2021

We have a go, I will create a fork and CC you @potiuk in the PR. There is probably a lot of things we need to do since the only goal was to implement enough functionality for Dask to run properly.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Run Apache Airflow on Windows 10 without Docker
Apache Airflow is a great tool to manage and schedule all steps of a data pipeline. However, running it on Windows 10 can...
Read more >
python - How to run Airflow on Windows
Install Airflow into Windows 10 WSL with Ubuntu - This worked great. Note that WSL is Windows Subsystem for Linux, which you can...
Read more >
apache-airflow-providers-microsoft-winrm Documentation
This is a provider package for microsoft.winrm provider. All classes for this provider package are in airflow.providers.microsoft.winrm python package.
Read more >
Installing Apache Airflow on Windows 10 without Docker
What is Airflow? Apache Airflow is an open-source platform to Author, Schedule and Monitor workflows. Airflow helps you to create workflows ...
Read more >
Install Airflow on Windows via Windows Subsystem for Linux ...
Airflow is a Python based workflow tool published by Apache to allow you to create, schedule and monitor workflows programmatically.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found