question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

log_id field is missing from log lines (ES remote logging)

See original GitHub issue

Apache Airflow version: apache/airflow:1.10.11

Kubernetes version (if you are using kubernetes) (use kubectl version): v1.16.11-gke.5

Environment: GKE

What happened: Webserver doesn’t fetch logs for tasks from elasticsearch

What you expected to happen: task logs will be displayed in the webserver UI

It seems like the webserver is trying to query task logs by the log_id field: https://github.com/apache/airflow/blob/1.10.11/airflow/utils/log/es_task_handler.py#L175

this field is missing from all log lines (which are written to stdout) using the KubernetesExecutor. Example log line: {"asctime": null, "filename": "standard_task_runner.py", "lineno": 77, "levelname": "INFO", "message": "Running: ['airflow', 'run', 'hello_world', 'hello_task_3', '2020-08-19T14:26:07.226064+00:00', '--job_id', '158', '--pool', 'default_pool', '--raw', '-sd', '/opt/airflow/dags/repo/dags/hello_world.py', '--cfg_path', '/tmp/tmpt7lafkaf']", "dag_id": "hello_world", "task_id": "hello_task_3", "execution_date": "2020_08_19T14_26_07_226064", "try_number": "1"}

How to reproduce it: this is the relevant configuration we have, scheduler and webserver running separately and tasks run using KubernetsExecutor (all in the same cluster/namespace):

AIRFLOW__CORE__LOGGING_LEVEL: INFO
AIRFLOW__CORE__REMOTE_LOGGING: "True"
AIRFLOW__ELASTICSEARCH__HOST: http://elasticsearch.logging:9200
AIRFLOW__ELASTICSEARCH__JSON_FORMAT: "True"
AIRFLOW__ELASTICSEARCH__WRITE_STDOUT: "True"

we are using fluentd (https://github.com/fluent/fluentd-kubernetes-daemonset) to forward log lines to elasticsearch, all task logs are written to stdout + elasticsearch as expected.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
eyalzekcommented, Aug 21, 2020

For posterity, for anyone deploying to kubernetes and using EFK for logging (specifically with https://github.com/fluent/fluentd-kubernetes-daemonset), this is the fluentd configuration we’re using at the moment for getting log_id & offset into worker log lines:

<filter kubernetes.var.log.containers.**>
  @type parser
  <parse>
    @type json
  </parse>
  emit_invalid_record_to_error false
  key_name log
  replace_invalid_sequence true
  reserve_data true
  reserve_time true
  remove_key_name_field true
</filter>

<filter var.log.containers.**>
  @type record_modifier
  prepare_value time = Time.now; @offset = time.to_i * (10 ** 9) + time.nsec
  remove_keys _dummy_
  <record>
    _dummy_ ${if record.has_key?('task_log'); record['log_id'] = "#{record['kubernetes']['labels']['dag_id']}-#{record['kubernetes']['labels']['task_id']}-#{record['kubernetes']['labels']['execution_date'].gsub(/_plus.+/, '').gsub(/[-\.]/, '_')}-#{record['kubernetes']['labels']['try_number']}"; record['offset'] = @offset; end; nil}
  </record>
</filter>

in conjunction with the following airflow configuration:

AIRFLOW__CORE__REMOTE_LOGGING: "True"
AIRFLOW__ELASTICSEARCH__HOST: http://elasticsearch:9200
AIRFLOW__ELASTICSEARCH__WRITE_STDOUT: "True"
AIRFLOW__ELASTICSEARCH__JSON_FIELDS: asctime, filename, lineno, levelname, message, task_log # task_log is used to tell task logs apart from airflow logs in fluentd
AIRFLOW__ELASTICSEARCH__JSON_FORMAT: "True"
0reactions
eyalzekcommented, Jul 14, 2021

We switched away from EFK to stackdriver logging a while ago so I can’t really say. This sound like you might need to configure multiline parsing on the fluentd side though

Read more comments on GitHub >

github_iconTop Results From Across the Web

Logging | Elasticsearch Guide [8.5] | Elastic
If you run Elasticsearch from the command line, Elasticsearch prints logs to ... The property ${sys:es.logs.base_path} will resolve to the log directory, ...
Read more >
Log Files - Apache HTTP Server Version 2.4
In this document we discuss the logging modules that are a standard part of the http server. top. Security Warning. Anyone who can...
Read more >
Access and Error Logs - The Ultimate Guide To Logging - Loggly
Here are some of the most valuable log fields when monitoring server health or for troubleshooting issues. You should consider including each of...
Read more >
View logs by using the Logs Explorer - Google Cloud
This document provides you with an overview of the Logs Explorer in the Google Cloud console, which you can use to retrieve, view,...
Read more >
Configuring logging - Keycloak
As a result, the SQL abstract syntax trees are omitted instead of appearing at the ... To set the logging format for a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found