TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator
See original GitHub issueApache Airflow version: v2.0.0 Git Version: release:2.0.0+ab5f770bfcd8c690cbe4d0825896325aca0beeca
Docker version: Docker version 20.10.1, build 831ebeae96
Environment:
- Cloud provider or hardware configuration: local setup, docker engine in swarm mode, docker stack deploy
- OS (e.g. from /etc/os-release): Manjaro Linux
- Kernel (e.g.
uname -a
): 5.9.11 - Install tools:
- docker airflow image apache/airflow:2.0.0-python3.8 (hash fe4a64af9553)
- Others:
What happened:
When using DockerSwarmOperator
(either contrib
or providers
module) together with the default enable_logging=True
option, tasks do not succeed and stay in state running
. When checking the docker service logs
I can clearly see that the container ran and ended successfully. Airflow however does not recognize that the container finished and keeps the tasks in state running
.
However, when using enable_logging=False
AND auto_remove=False
containers are recognized as finished and tasks are correctly in state success
. When using enable_logging=False
and auto_remove=True
I get the following error message
{taskinstance.py:1396} ERROR - 404 Client Error: Not Found ("service 936om1s4zso10ye5ferhvwnxn not found")
What you expected to happen:
When I run a DAG with DockerSwarmOperator
s in it I expect that docker containers are distributed to the docker swarm and that container logs and states are correctly tracked by the DockerSwarmOperator. Meaning, with enable_logging=True
option I would expect that the TaskInstance’s log contains the logging output of the docker container/service. Furthermore, when using the auto_remove=True
option I would expect that docker services are removed after the TaskInstance is finished successfully.
It looks like something is broken with the enable_logging
and auto_remove=True
options.
How to reproduce it:
Dockerfile
FROM apache/airflow:2.0.0-python3.8
ARG DOCKER_GROUP_ID
USER root
RUN groupadd --gid $DOCKER_GROUP_ID docker \
&& usermod -aG docker airflow
USER airflow
airflow user needs to be in the docker group to have access to the docker daemon
build the Dockerfile
docker build --build-arg DOCKER_GROUP_ID=$(getent group docker | awk -F: '{print $3}') -t docker-swarm-bug .
docker-stack.yml
version: "3.2"
networks:
airflow:
services:
postgres:
image: postgres:13.1
environment:
- POSTGRES_USER=airflow
- POSTGRES_DB=airflow
- POSTGRES_PASSWORD=airflow
- PGDATA=/var/lib/postgresql/data/pgdata
ports:
- 5432:5432
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./database/data:/var/lib/postgresql/data/pgdata
- ./database/logs:/var/lib/postgresql/data/log
command: >
postgres
-c listen_addresses=*
-c logging_collector=on
-c log_destination=stderr
-c max_connections=200
networks:
- airflow
redis:
image: redis:5.0.5
environment:
REDIS_HOST: redis
REDIS_PORT: 6379
ports:
- 6379:6379
networks:
- airflow
webserver:
env_file:
- .env
image: docker-swarm-bug:latest
ports:
- 8080:8080
volumes:
- ./airflow_files/dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./files:/opt/airflow/files
- /var/run/docker.sock:/var/run/docker.sock
deploy:
restart_policy:
condition: on-failure
delay: 8s
max_attempts: 3
depends_on:
- postgres
- redis
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /opt/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
networks:
- airflow
flower:
image: docker-swarm-bug:latest
env_file:
- .env
ports:
- 5555:5555
depends_on:
- redis
deploy:
restart_policy:
condition: on-failure
delay: 8s
max_attempts: 3
volumes:
- ./logs:/opt/airflow/logs
command: celery flower
networks:
- airflow
scheduler:
image: docker-swarm-bug:latest
env_file:
- .env
volumes:
- ./airflow_files/dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./files:/opt/airflow/files
- /var/run/docker.sock:/var/run/docker.sock
command: scheduler
deploy:
restart_policy:
condition: on-failure
delay: 8s
max_attempts: 3
networks:
- airflow
worker:
image: docker-swarm-bug:latest
env_file:
- .env
volumes:
- ./airflow_files/dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./files:/opt/airflow/files
- /var/run/docker.sock:/var/run/docker.sock
command: celery worker
depends_on:
- scheduler
deploy:
restart_policy:
condition: on-failure
delay: 8s
max_attempts: 3
networks:
- airflow
initdb:
image: docker-swarm-bug:latest
env_file:
- .env
volumes:
- ./airflow_files/dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./files:/opt/airflow/files
- /var/run/docker.sock:/var/run/docker.sock
entrypoint: /bin/bash
deploy:
restart_policy:
condition: on-failure
delay: 8s
max_attempts: 5
command: -c "airflow db init && airflow users create --firstname admin --lastname admin --email admin --password admin --username admin --role Admin"
depends_on:
- redis
- postgres
networks:
- airflow
docker_swarm_bug.py
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.providers.docker.operators.docker_swarm import DockerSwarmOperator
# you can also try DockerSwarmOperator from contrib module, shouldn't make a difference
# from airflow.contrib.operators.docker_swarm_operator import DockerSwarmOperator
default_args = {
"owner": "airflow",
"start_date": "2021-01-14"
}
with DAG(
"docker_swarm_bug", default_args=default_args, schedule_interval="@once"
) as dag:
start_op = BashOperator(
task_id="start_op", bash_command="echo start testing multiple dockers",
)
docker_swarm = list()
for i in range(16):
docker_swarm.append(
DockerSwarmOperator(
task_id=f"docker_swarm_{i}",
image="hello-world:latest",
force_pull=True,
auto_remove=True,
api_version="auto",
docker_url="unix://var/run/docker.sock",
network_mode="bridge",
enable_logging=False,
)
)
finish_op = BashOperator(
task_id="finish_op", bash_command="echo finish testing multiple dockers",
)
start_op >> docker_swarm >> finish_op
create directories, copy DAG and set permissions
mkdir -p airflow_files/dags
cp docker_swarm_bug.py airflow_files/dags/
mkdir logs
mkdir files
sudo chown -R 50000 airflow_files logs files
uid 50000 is the id of the airflow user inside the docker images
deploy docker-stack.yml
docker stack deploy --compose-file docker-stack.yml airflow
trigger DAG docker_swarm_bug
in UI
Anything else we need to know:
Problem occurs with the options enable_logging=True
.
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (2 by maintainers)
Top GitHub Comments
@eladkal Sorry to bump an old issue, but it seems to persist with version release:2.2.3+06c82e17e9d7ff1bf261357e84c6013ccdb3c241
Containers are spawned, complete successfully, are removed, and Airflow does not mark them as completed if enable_logging=True
Indeed. You should not do it.
Please @alexcolpitts96 @FriedrichSal open new issues with detailed description of your circumstances, logs and reproduction cases. Commenting on an old, closed issues (and especially “I have the same issue”) adds precisely 0 value without logs and details). Please watch my talk from the Summit to understand why https://www.youtube.com/watch?v=G6VjYvKr2wQ&list=PLGudixcDaxY2LxjeHpZRtzq7miykjjFOn&index=54