question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Airflow2.0 cannot read remote log from GCP GCS

See original GitHub issue

Apache Airflow version: v2.1.0.dev0

Kubernetes version (if you are using kubernetes) (use kubectl version):

Environment:

  • Cloud provider or hardware configuration: Docker on GKE
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

What happened:

Here is my logging configuration at airflow.cfg

[logging]
# The folder where airflow should store its log files
# This path must be absolute
base_log_folder = /opt/airflow/logs

# Airflow can store logs remotely in AWS S3, Google Cloud Storage or Elastic Search.
# Set this to True if you want to enable remote logging.
remote_logging = True

# Users must supply an Airflow connection id that provides access to the storage
# location.
remote_log_conn_id = AIRFLOW_LOG_BUCKET

# Path to Google Credential JSON file. If omitted, authorization based on `the Application Default
# Credentials
# <https://cloud.google.com/docs/authentication/production#finding_credentials_automatically>`__ will
# be used.
google_key_path = /secrets/service_account.json

# Storage bucket URL for remote logging
# S3 buckets should start with "s3://"
# Cloudwatch log groups should start with "cloudwatch://"
# GCS buckets should start with "gs://"
# WASB buckets should start with "wasb" just to help Airflow select correct handler
# Stackdriver logs should start with "stackdriver://"
remote_base_log_folder = gs://airflow/logs

# Use server-side encryption for logs stored in S3
encrypt_s3_logs = False

# Logging level
logging_level = INFO

# Logging level for Flask-appbuilder UI
fab_logging_level = WARN

# Logging class
# Specify the class that will specify the logging configuration
# This class has to be on the python classpath
# Example: logging_config_class = my.path.default_local_settings.LOGGING_CONFIG
logging_config_class =

# Flag to enable/disable Colored logs in Console
# Colour the logs when the controlling terminal is a TTY.
colored_console_log = True

# Log format for when Colored logs is enabled
colored_log_format = [%%(blue)s%%(asctime)s%%(reset)s] {%%(blue)s%%(filename)s:%%(reset)s%%(lineno)d} %%(log_color)s%%(levelname)s%%(reset)s - %%(log_color)s%%(message)s%%(reset)s
colored_formatter_class = airflow.utils.log.colored_log.CustomTTYColoredFormatter

# Format of Log line
log_format = [%%(asctime)s] {%%(filename)s:%%(lineno)d} %%(levelname)s - %%(message)s
simple_log_format = %%(asctime)s %%(levelname)s - %%(message)s

# Specify prefix pattern like mentioned below with stream handler TaskHandlerWithCustomFormatter
# Example: task_log_prefix_template = {ti.dag_id}-{ti.task_id}-{execution_date}-{try_number}
task_log_prefix_template =

# Formatting for how airflow generates file names/paths for each task run.
log_filename_template = {{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log

# Formatting for how airflow generates file names for log
log_processor_filename_template = {{ filename }}.log

# full path of dag_processor_manager logfile
dag_processor_manager_log_location = /opt/airflow/logs/dag_processor_manager/dag_processor_manager.log

# Name of handler to read task instance logs.
# Defaults to use ``task`` handler.
task_log_reader = task

# A comma\-separated list of third-party logger names that will be configured to print messages to
# consoles\.
# Example: extra_loggers = connexion,sqlalchemy
extra_loggers =

What you expected to happen:

I expected I could read the task log on UI

How to reproduce it:

While I run any DAGs, the log always shows up this.

*** Unable to read remote log from gs://airflow/server/logs/dag_id/task_id/2021-02-21T00:40:00+00:00/4.log
*** maximum recursion depth exceeded while calling a Python object

*** Log file does not exist: /opt/airflow/logs/dag_id/task_id/2021-02-21T00:40:00+00:00/4.log
*** Fetching from: http://airflow-worker-deploy-685868b855-fx7cr:8793/log/dag_id/task_id/2021-02-21T00:40:00+00:00/4.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='airflow-worker-deploy-685868b855-fx7cr', port=8793): Max retries exceeded with url: /log/dag_id/task_id//2021-02-21T00:40:00+00:00/4.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fce4f00e7f0>: Failed to establish a new connection: [Errno -2] Name or service not known'))

The log files have already exists on worker and store on GCS.

Anything else we need to know:

Here is the log from workers:

[2021-02-22 07:50:04,666: INFO/MainProcess] Received task: airflow.executors.celery_executor.execute_command[b26e0c24-ab52-48b6-a726-0e9ccdeea124]
[2021-02-22 07:50:04,684: INFO/ForkPoolWorker-8] Executing command in Celery: ['airflow', 'tasks', 'run', 'link_account_bq2mysql', 'new_account_alerting_mysql', '2021-02-22T07:49:00+00:00', '--local', '--pool', 'default_pool', '--subdir', '/opt/airflow/dags/new_account_linking_bq2mysql/link_account_bq2mysql.py']
[2021-02-22 07:50:04,781: INFO/ForkPoolWorker-8] Filling up the DagBag from /opt/airflow/dags/new_account_linking_bq2mysql/link_account_bq2mysql.py
[2021-02-22 07:50:05,438: WARNING/ForkPoolWorker-8] /home/airflow/.local/lib/python3.8/site-packages/airflow/utils/decorators.py:94 DeprecationWarning: provide_context is deprecated as of 2.0 and is no longer required
[2021-02-22 07:50:05,563: WARNING/ForkPoolWorker-8] Running <TaskInstance: link_account_bq2mysql.new_account_alerting_mysql 2021-02-22T07:49:00+00:00 [queued]> on host airflow-worker-deploy-685868b855-96vkq
[2021-02-22 07:50:06,215: INFO/ForkPoolWorker-8] Previous log discarded: 404 GET https://storage.googleapis.com/download/storage/v1/b/airflow/o/server%2Flogs%2Flink_account_bq2mysql%2Fnew_account_alerting_mysql%2F2021-02-22T07%3A49%3A00%2B00%3A00%2F1.log?alt=media: No such object: airflow/server/logs/link_account_bq2mysql/new_account_alerting_mysql/2021-02-22T07:49:00+00:00/1.log: ('Request failed with status code', 404, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PARTIAL_CONTENT: 206>)
[2021-02-22 07:50:06,299: INFO/ForkPoolWorker-8] Task airflow.executors.celery_executor.execute_command[b26e0c24-ab52-48b6-a726-0e9ccdeea124] succeeded in 1.626359046989819s: None

I had tried to use postman to send the storage JSON API https://storage.googleapis.com/download/storage/v1/b/airflow/o/server%2Flogs%2Flink_account_bq2mysql%2Fnew_account_alerting_mysql%2F2021-02-22T07%3A49%3A00%2B00%3A00%2F1.log?alt=media It returns 200 sucess and get to log. I am not sure why airflow got 404.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:2
  • Comments:11 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
Rukeithcommented, Feb 23, 2021

@SamWheating gs://airflow is just a mock name which is not the exact name on airflow.cfg

1reaction
LeXy0623commented, Nov 16, 2021

Hi, I have same log symptom at the Task log (I didn’t check the worker log, yet). What is interesting is that it happens only 1-3 tasks per 500 daily and always changing when and where it happens but for me it is only BigQueryOperators which are afftected. It is Composer composer-1.17.0-preview.3-airflow-2.0.1

Read more comments on GitHub >

github_iconTop Results From Across the Web

Writing logs to Google Cloud Storage - Apache Airflow
Remote logging to Google Cloud Storage uses an existing Airflow connection to read or write logs. If you don't have a connection properly...
Read more >
Unable to read logs from GCS bucket in Airflow 1.10
Remote logging to Google Cloud Storage using an existing Airflow connection to read or write logs fails. If you don't have a connection ......
Read more >
[GitHub] [airflow] Rukeith opened a new issue #14352
Set this to True if you want to enable remote logging. ... Unable to read remote log from gs://airflow/server/logs/dag_id/task_id/2021-02- ...
Read more >
Known issues | Cloud Composer
Workaround: Rename the logs generated by Airflow 1.9.0 in the Cloud Storage ... the first DAG run for it fail with the Unable...
Read more >
apache-airflow-providers-google 8.6.0 - PyPI
You can find package information and changelog for the provider in the documentation. Installation. You can install this package on top of an...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found