S3 remote logging not working for airflow server components
See original GitHub issueApache Airflow version: 2.0.1
Environment:
- Cloud provider or hardware configuration: on my laptop
- OS (e.g. from /etc/os-release): MacOS Majave 10.14.6
- Kernel (e.g.
uname -a
): Darwin Wongs-MBP 18.7.0 Darwin Kernel Version 18.7.0: Tue Jan 12 22:04:47 PST 2021; root:xnu-4903.278.56~1/RELEASE_X86_64 x86_64
What happened:
configured remote logging to S3 bucket, only the logs of DAG runs appeared in the bucket. logs of airflow server components: scheduler, web server, etc did not appear
What you expected to happen:
all logs go to S3 bucket
How to reproduce it:
-
follow the quick start guide in https://airflow.apache.org/docs/apache-airflow/stable/start/local.html
-
before starting web server set the following variables:
export AIRFLOW__LOGGING__REMOTE_LOGGING=True
export AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER=s3://my-bucket/
export AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID=my_remote_logging_conn_id
- start the web server and set your S3 connection settings in the web server “connections” section.
Conn Id * my_remote_logging_conn_id
Conn Type S3
Extra {"region_name": "nyc3",
"host": "https://nyc3.digitaloceanspaces.com",
"aws_access_key_id": "xxx",
"aws_secret_access_key": "xxx"}
- Restart the web server
- Start the scheduler in another console window (setting the same env variables)
- Execute a DAG
- Head to your S3 bucket UI, you will see only logs of DAG runs appear.
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (4 by maintainers)
Top Results From Across the Web
AirFlow Remote Logging Using S3 Object Storage
This article describes how to configure remote logging to S3 ... Airflow provides two important components that work in concert to launch ...
Read more >setting up s3 for logs in airflow - Stack Overflow
[core] # Airflow can store logs remotely in AWS S3. Users must supply a remote # location URL (starting with either 's3://.
Read more >Writing logs to Amazon S3 - Apache Airflow
Remote logging to Amazon S3 uses an existing Airflow connection to read or write logs. If you don't have a connection properly setup,...
Read more >Airflow logging | Astronomer Documentation
This means that logs of currently running tasks are accessible only from your local Airflow environment. Remote logging example: Send task logs to...
Read more >A look under the hood of the Airflow logging subsystem
Session presented by Philippe Gagnon at Airflow Summit 2022The task logging subsystem is one of most flexible, yet complex and misunderstood ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Airflow handles some logs in a special way and unfortunately, it is not easy to unify this as our logs have different characteristics.
Logs for tasks are saved to files because they are small and we can upload them after completing the tasks. When they are sent to object storage it is much easier to read to them, but each time you add a new line, we need to get all the contents and upload a new file with the full log contents. Object storages have limited support for appending operations. On the other hand, logs for other components are an endless stream of data that cannot be stopped to send the content. For this reason, conventional tools fit better because they are optimized to handle such operations.
I don’t think we can fix this problem, but we can update the documentation to better describe these project assumption.
Related issue: https://github.com/apache/airflow/issues/10593
Since this is not classified as a bug, I will close this. Thanks for your help.