Airflow2.0 cannot read remote log from GCP GCS
See original GitHub issueApache Airflow version: v2.1.0.dev0
Kubernetes version (if you are using kubernetes) (use kubectl version
):
Environment:
- Cloud provider or hardware configuration: Docker on GKE
- OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a
): - Install tools:
- Others:
What happened:
Here is my logging configuration at airflow.cfg
[logging]
# The folder where airflow should store its log files
# This path must be absolute
base_log_folder = /opt/airflow/logs
# Airflow can store logs remotely in AWS S3, Google Cloud Storage or Elastic Search.
# Set this to True if you want to enable remote logging.
remote_logging = True
# Users must supply an Airflow connection id that provides access to the storage
# location.
remote_log_conn_id = AIRFLOW_LOG_BUCKET
# Path to Google Credential JSON file. If omitted, authorization based on `the Application Default
# Credentials
# <https://cloud.google.com/docs/authentication/production#finding_credentials_automatically>`__ will
# be used.
google_key_path = /secrets/service_account.json
# Storage bucket URL for remote logging
# S3 buckets should start with "s3://"
# Cloudwatch log groups should start with "cloudwatch://"
# GCS buckets should start with "gs://"
# WASB buckets should start with "wasb" just to help Airflow select correct handler
# Stackdriver logs should start with "stackdriver://"
remote_base_log_folder = gs://airflow/logs
# Use server-side encryption for logs stored in S3
encrypt_s3_logs = False
# Logging level
logging_level = INFO
# Logging level for Flask-appbuilder UI
fab_logging_level = WARN
# Logging class
# Specify the class that will specify the logging configuration
# This class has to be on the python classpath
# Example: logging_config_class = my.path.default_local_settings.LOGGING_CONFIG
logging_config_class =
# Flag to enable/disable Colored logs in Console
# Colour the logs when the controlling terminal is a TTY.
colored_console_log = True
# Log format for when Colored logs is enabled
colored_log_format = [%%(blue)s%%(asctime)s%%(reset)s] {%%(blue)s%%(filename)s:%%(reset)s%%(lineno)d} %%(log_color)s%%(levelname)s%%(reset)s - %%(log_color)s%%(message)s%%(reset)s
colored_formatter_class = airflow.utils.log.colored_log.CustomTTYColoredFormatter
# Format of Log line
log_format = [%%(asctime)s] {%%(filename)s:%%(lineno)d} %%(levelname)s - %%(message)s
simple_log_format = %%(asctime)s %%(levelname)s - %%(message)s
# Specify prefix pattern like mentioned below with stream handler TaskHandlerWithCustomFormatter
# Example: task_log_prefix_template = {ti.dag_id}-{ti.task_id}-{execution_date}-{try_number}
task_log_prefix_template =
# Formatting for how airflow generates file names/paths for each task run.
log_filename_template = {{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log
# Formatting for how airflow generates file names for log
log_processor_filename_template = {{ filename }}.log
# full path of dag_processor_manager logfile
dag_processor_manager_log_location = /opt/airflow/logs/dag_processor_manager/dag_processor_manager.log
# Name of handler to read task instance logs.
# Defaults to use ``task`` handler.
task_log_reader = task
# A comma\-separated list of third-party logger names that will be configured to print messages to
# consoles\.
# Example: extra_loggers = connexion,sqlalchemy
extra_loggers =
What you expected to happen:
I expected I could read the task log on UI
How to reproduce it:
While I run any DAGs, the log always shows up this.
*** Unable to read remote log from gs://airflow/server/logs/dag_id/task_id/2021-02-21T00:40:00+00:00/4.log
*** maximum recursion depth exceeded while calling a Python object
*** Log file does not exist: /opt/airflow/logs/dag_id/task_id/2021-02-21T00:40:00+00:00/4.log
*** Fetching from: http://airflow-worker-deploy-685868b855-fx7cr:8793/log/dag_id/task_id/2021-02-21T00:40:00+00:00/4.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='airflow-worker-deploy-685868b855-fx7cr', port=8793): Max retries exceeded with url: /log/dag_id/task_id//2021-02-21T00:40:00+00:00/4.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fce4f00e7f0>: Failed to establish a new connection: [Errno -2] Name or service not known'))
The log files have already exists on worker and store on GCS.
Anything else we need to know:
Here is the log from workers:
[2021-02-22 07:50:04,666: INFO/MainProcess] Received task: airflow.executors.celery_executor.execute_command[b26e0c24-ab52-48b6-a726-0e9ccdeea124]
[2021-02-22 07:50:04,684: INFO/ForkPoolWorker-8] Executing command in Celery: ['airflow', 'tasks', 'run', 'link_account_bq2mysql', 'new_account_alerting_mysql', '2021-02-22T07:49:00+00:00', '--local', '--pool', 'default_pool', '--subdir', '/opt/airflow/dags/new_account_linking_bq2mysql/link_account_bq2mysql.py']
[2021-02-22 07:50:04,781: INFO/ForkPoolWorker-8] Filling up the DagBag from /opt/airflow/dags/new_account_linking_bq2mysql/link_account_bq2mysql.py
[2021-02-22 07:50:05,438: WARNING/ForkPoolWorker-8] /home/airflow/.local/lib/python3.8/site-packages/airflow/utils/decorators.py:94 DeprecationWarning: provide_context is deprecated as of 2.0 and is no longer required
[2021-02-22 07:50:05,563: WARNING/ForkPoolWorker-8] Running <TaskInstance: link_account_bq2mysql.new_account_alerting_mysql 2021-02-22T07:49:00+00:00 [queued]> on host airflow-worker-deploy-685868b855-96vkq
[2021-02-22 07:50:06,215: INFO/ForkPoolWorker-8] Previous log discarded: 404 GET https://storage.googleapis.com/download/storage/v1/b/airflow/o/server%2Flogs%2Flink_account_bq2mysql%2Fnew_account_alerting_mysql%2F2021-02-22T07%3A49%3A00%2B00%3A00%2F1.log?alt=media: No such object: airflow/server/logs/link_account_bq2mysql/new_account_alerting_mysql/2021-02-22T07:49:00+00:00/1.log: ('Request failed with status code', 404, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PARTIAL_CONTENT: 206>)
[2021-02-22 07:50:06,299: INFO/ForkPoolWorker-8] Task airflow.executors.celery_executor.execute_command[b26e0c24-ab52-48b6-a726-0e9ccdeea124] succeeded in 1.626359046989819s: None
I had tried to use postman to send the storage JSON API https://storage.googleapis.com/download/storage/v1/b/airflow/o/server%2Flogs%2Flink_account_bq2mysql%2Fnew_account_alerting_mysql%2F2021-02-22T07%3A49%3A00%2B00%3A00%2F1.log?alt=media
It returns 200 sucess and get to log. I am not sure why airflow got 404.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:11 (3 by maintainers)
Top GitHub Comments
@SamWheating
gs://airflow
is just a mock name which is not the exact name onairflow.cfg
Hi, I have same log symptom at the Task log (I didn’t check the worker log, yet). What is interesting is that it happens only 1-3 tasks per 500 daily and always changing when and where it happens but for me it is only BigQueryOperators which are afftected. It is Composer composer-1.17.0-preview.3-airflow-2.0.1