Remote logging broken (Airflow 2.0.0 on GCS)
See original GitHub issueApache Airflow version: 2.0.0
Kubernetes version (if you are using kubernetes) (use kubectl version
):
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.14-gke.1600", GitCommit:"7c407f5cc8632f9af5a2657f220963aa7f1c46e7", GitTreeState:"clean", BuildDate:"2020-12-07T09:22:27Z", GoVersion:"go1.13.15b4", Compiler:"gc", Platform:"linux/amd64"}
Environment:
- Cloud provider or hardware configuration: GKE
- OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a
): - Install tools:
- Others:
What happened:
Remote logging is configured as follows:
export AIRFLOW__LOGGING__REMOTE_LOGGING=True
export AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER=gs://your-bucket-name"
export AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID=google_cloud_default
It worked flawlessly before the upgrade to 2.0.0. Now its utterly broken, and returns weird one-line logs, all prefixed with the same error message.
All logs are broken in the same way:
*** Reading remote log from gs://<your-bucket-name>/<dag_id>/<task_id>/2020-01-24T00:00:00+00:00/1.log.
b'*** Previous log discarded: 404 GET https://storage.googleapis.com/download/storage/v1/b/<redacted>?alt=media: No such object: <redacted>/2020-01-24T00:00:00+00:00/1.log: (\'Request failed with status code\', 404, \'Expected one of\', <HTTPStatus.OK: 200>, <HTTPStatus.PARTIAL_CONTENT: 206>)\n\n[2020-12-28 14:57:51,263] {taskinstance.py:826} INFO - Dependencies all met for <TaskInstance: <redacted> 2020-01-24T00:00:00+00:00 [queued]>\n[2020-12-28 14:57:51,281] {taskinstance.py:826} INFO - Dependencies all met for <TaskInstance: <redacted> 2020-01-24T00:00:00+00:00 [queued]>\n[2020-12-28 14:57:51,281] {taskinstance.py:1017} INFO - \n--------------------------------------------------------------------------------\n[2020-12-28 14:57:51,281] {taskinstance.py:1018} INFO - Starting attempt 1 of 1\n[2020-12-28 14:57:51,281] {taskinstance.py:1019} INFO - \n--------------------------------------------------------------------------------\n[2020-12-28 14:57:51,305] {taskinstance.py:1038} <<<SNIP>>>\n'
As you can see the, the actual log is there, however scrambled. At first I thought the log could not be written. But when I manually check the bucket, the log is actually there!
I suspect the GCSTaskHandler / remote logging to be broken on 2.0.0.
What you expected to happen:
Remote logging to GCS works as advertised.
How to reproduce it:
Get a GCS bucket.
Configure remote logging to GCS.
export AIRFLOW__LOGGING__REMOTE_LOGGING=True
export AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER=gs://your-bucket-name"
export AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID=google_cloud_default
Issue Analytics
- State:
- Created 3 years ago
- Comments:20 (10 by maintainers)
Top Results From Across the Web
[GitHub] [airflow] potiuk commented on issue #13343: Remote ...
[GitHub] [airflow] potiuk commented on issue #13343: Remote logging broken (Airflow 2.0.0 on GCS) · GitBox Tue, 12 Jan 2021 01:35:28 -0800.
Read more >Writing logs to Google Cloud Storage - Apache Airflow
Remote logging to Google Cloud Storage uses an existing Airflow connection to read or write logs. If you don't have a connection properly...
Read more >Configuration Reference — Airflow Documentation
New in version 2.0.0. ... Specify if remote control of the workers is enabled. ... log groups should start with “cloudwatch://” GCS buckets...
Read more >apache-airflow-providers-google Documentation
All classes for this provider package are in airflow.providers.google ... GCS to BigQuery Transfer Operator with Labels and Description parameter (#14881).
Read more >Package apache-airflow-providers-google
Commit Committed Subject
32971a1a2 2020‑12‑09 Updates providers versions to 1.0.0 (#12955)
b40dffa08 2020‑12‑08 Rename remaing modules to match AIP‑21 (#12917)
76bcd08dc 2020‑11‑28 Added '@apply_defaults' decorator. (#12620)...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@boring-cyborg who put you in charge? I followed the issue template!
@potiuk I still have the same problem. That
-
was just a typo on my issue, the problem still persists. I have updated my issue accordingly.I do not have a permission problem, the logs get written too! I would appreciate some helpers in how to debug/better the problem instead of a blunt rejection of the problem at hand.
Here is the log file written to the bucket:
Here is the same log output from Airflow, just scrambled:
Here is a screenshot showing the Airflow service account has “Storage Admin” on the same bucket:
Like I said, I am very willing to help and debug the problem, please just don’t close the issue…