log_id field is missing from log lines (ES remote logging)
See original GitHub issueApache Airflow version: apache/airflow:1.10.11
Kubernetes version (if you are using kubernetes) (use kubectl version
):
v1.16.11-gke.5
Environment: GKE
What happened: Webserver doesn’t fetch logs for tasks from elasticsearch
What you expected to happen: task logs will be displayed in the webserver UI
It seems like the webserver is trying to query task logs by the log_id
field:
https://github.com/apache/airflow/blob/1.10.11/airflow/utils/log/es_task_handler.py#L175
this field is missing from all log lines (which are written to stdout) using the KubernetesExecutor. Example log line:
{"asctime": null, "filename": "standard_task_runner.py", "lineno": 77, "levelname": "INFO", "message": "Running: ['airflow', 'run', 'hello_world', 'hello_task_3', '2020-08-19T14:26:07.226064+00:00', '--job_id', '158', '--pool', 'default_pool', '--raw', '-sd', '/opt/airflow/dags/repo/dags/hello_world.py', '--cfg_path', '/tmp/tmpt7lafkaf']", "dag_id": "hello_world", "task_id": "hello_task_3", "execution_date": "2020_08_19T14_26_07_226064", "try_number": "1"}
How to reproduce it: this is the relevant configuration we have, scheduler and webserver running separately and tasks run using KubernetsExecutor (all in the same cluster/namespace):
AIRFLOW__CORE__LOGGING_LEVEL: INFO
AIRFLOW__CORE__REMOTE_LOGGING: "True"
AIRFLOW__ELASTICSEARCH__HOST: http://elasticsearch.logging:9200
AIRFLOW__ELASTICSEARCH__JSON_FORMAT: "True"
AIRFLOW__ELASTICSEARCH__WRITE_STDOUT: "True"
we are using fluentd (https://github.com/fluent/fluentd-kubernetes-daemonset) to forward log lines to elasticsearch, all task logs are written to stdout + elasticsearch as expected.
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (4 by maintainers)
For posterity, for anyone deploying to kubernetes and using EFK for logging (specifically with https://github.com/fluent/fluentd-kubernetes-daemonset), this is the fluentd configuration we’re using at the moment for getting
log_id
&offset
into worker log lines:in conjunction with the following airflow configuration:
We switched away from EFK to stackdriver logging a while ago so I can’t really say. This sound like you might need to configure
multiline
parsing on the fluentd side though