question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

S3 remote logging not working for airflow server components

See original GitHub issue

Apache Airflow version: 2.0.1

Environment:

  • Cloud provider or hardware configuration: on my laptop
  • OS (e.g. from /etc/os-release): MacOS Majave 10.14.6
  • Kernel (e.g. uname -a): Darwin Wongs-MBP 18.7.0 Darwin Kernel Version 18.7.0: Tue Jan 12 22:04:47 PST 2021; root:xnu-4903.278.56~1/RELEASE_X86_64 x86_64

What happened:

configured remote logging to S3 bucket, only the logs of DAG runs appeared in the bucket. logs of airflow server components: scheduler, web server, etc did not appear

What you expected to happen:

all logs go to S3 bucket

How to reproduce it:

  1. follow the quick start guide in https://airflow.apache.org/docs/apache-airflow/stable/start/local.html

  2. before starting web server set the following variables:

export AIRFLOW__LOGGING__REMOTE_LOGGING=True
export AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER=s3://my-bucket/
export AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID=my_remote_logging_conn_id
  1. start the web server and set your S3 connection settings in the web server “connections” section.
Conn Id * my_remote_logging_conn_id
Conn Type  S3
Extra {"region_name": "nyc3",
 "host": "https://nyc3.digitaloceanspaces.com",
 "aws_access_key_id": "xxx",
 "aws_secret_access_key": "xxx"}
  1. Restart the web server
  2. Start the scheduler in another console window (setting the same env variables)
  3. Execute a DAG
  4. Head to your S3 bucket UI, you will see only logs of DAG runs appear.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
mik-lajcommented, Apr 19, 2021

Airflow handles some logs in a special way and unfortunately, it is not easy to unify this as our logs have different characteristics.

Logs for tasks are saved to files because they are small and we can upload them after completing the tasks. When they are sent to object storage it is much easier to read to them, but each time you add a new line, we need to get all the contents and upload a new file with the full log contents. Object storages have limited support for appending operations. On the other hand, logs for other components are an endless stream of data that cannot be stopped to send the content. For this reason, conventional tools fit better because they are optimized to handle such operations.

I don’t think we can fix this problem, but we can update the documentation to better describe these project assumption.

Related issue: https://github.com/apache/airflow/issues/10593

0reactions
kakarukeyscommented, Apr 20, 2021

Since this is not classified as a bug, I will close this. Thanks for your help.

Read more comments on GitHub >

github_iconTop Results From Across the Web

AirFlow Remote Logging Using S3 Object Storage
This article describes how to configure remote logging to S3 ... Airflow provides two important components that work in concert to launch ...
Read more >
setting up s3 for logs in airflow - Stack Overflow
[core] # Airflow can store logs remotely in AWS S3. Users must supply a remote # location URL (starting with either 's3://.
Read more >
Writing logs to Amazon S3 - Apache Airflow
Remote logging to Amazon S3 uses an existing Airflow connection to read or write logs. If you don't have a connection properly setup,...
Read more >
Airflow logging | Astronomer Documentation
This means that logs of currently running tasks are accessible only from your local Airflow environment. Remote logging example: Send task logs to...
Read more >
A look under the hood of the Airflow logging subsystem
Session presented by Philippe Gagnon at Airflow Summit 2022The task logging subsystem is one of most flexible, yet complex and misunderstood ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found