question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Remote logging not working with google bucket

See original GitHub issue

Apache Airflow version: apache/airflow:1.10.12

Kubernetes version (if you are using kubernetes) (use kubectl version):v1.15.12-gke.20

Environment: Google kubernetes engine

  • Cloud provider or hardware configuration: GCP

What happened: I am trying to push the DAG task logs to google storage bucket using the remote logging feature. I have setup the airflow on GKE using the official helm repo. I am using the Kubernetes executor The logs is not being shipped to the storage bucket specified in the configuration. I am getting the following error on my worker pods:

DEBUG:root:Calling callbacks: []                                                                                                                                                            │
│ Could not create a GoogleCloudStorageHook with connection id "google_cloud_default".                                                                                                        │
│ Please make sure that airflow[gcp] is installed and the GCS connection exists.                                                                                                              │
│ Could not write logs to gs://<bucket-name>/airflow/logs/test_utils/sleeps_forever/2020-11-09T17:00:45.078614+00:00/1.log: 'NoneType' object has no attribute 'upload'                     │
│ DEBUG:root:Calling callbacks: []                                                                                                                                                            │
│ DEBUG:airflow.settings:Disposing DB connection pool (PID 1)                                                                                                                                 │
│ stream closed

I have verified the the airflow[gcp] pip package is already available in the image.

This is the config I have used in my helm values file:

## GCP Remote Logging
    AIRFLOW__CORE__REMOTE_LOGGING: "True"
    AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: "gs://mybucket/airflow/logs"
    AIRFLOW__CORE__REMOTE_LOG_CONN_ID: "google_cloud_default"
    AIRFLOW__WEBSERVER__LOG_FETCH_TIMEOUT_SEC: "15"
    AIRFLOW__CORE__LOGGING_LEVEL: "DEBUG"

    ## Email (SMTP)
    AIRFLOW__EMAIL__EMAIL_BACKEND: "airflow.utils.email.send_email_smtp"
    AIRFLOW__SMTP__SMTP_HOST: "smtpmail.example.com"
    AIRFLOW__SMTP__SMTP_STARTTLS: "False"
    AIRFLOW__SMTP__SMTP_SSL: "False"
    AIRFLOW__SMTP__SMTP_PORT: "25"
    AIRFLOW__SMTP__SMTP_MAIL_FROM: "admin@airflow-cluster.example.com"

    ## Disable noisy "Handling signal: ttou" Gunicorn log messages
    GUNICORN_CMD_ARGS: "--log-level WARNING"

#    ##Kubernetes Executor
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY: "apache/airflow"
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG: "1.10.12-python3.7"
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_IMAGE_PULL_POLICY: "IfNotPresent"
    AIRFLOW__KUBERNETES__WORKER_PODS_CREATION_BATCH_SIZE: "10"
    AIRFLOW__KUBERNETES__DAGS_IN_IMAGE: "True"
    AIRFLOW__KUBERNETES__DAGS_VOLUME_CLAIM: "airflow-dags"
   #AIRFLOW__KUBERNETES__LOGS_VOLUME_CLAIM: "airflow-logs"
    AIRFLOW__KUBERNETES__GIT_DAGS_FOLDER_MOUNT_POINT: "/opt/airflow/dags"
    AIRFLOW__KUBERNETES__DAGS_VOLUME_SUBPATH: "repo/"
    AIRFLOW__KUBERNETES__GIT_SSH_KEY_SECRET_NAME: "airflow-secrets"
    AIRFLOW__KUBERNETES__NAMESPACE: "airflow"
    AIRFLOW__KUBERNETES__DELETE_WORKER_PODS: "True"
    AIRFLOW__KUBERNETES__RUN_AS_USER: "50000"
    AIRFLOW__KUBERNETES__WORKER_SERVICE_ACCOUNT_NAME: "airflow"
    AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__AIRFLOW__CORE__REMOTE_LOGGING: "True"
    AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: "gs://mybucket/airflow/logs"
    AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__AIRFLOW__CORE__REMOTE_LOG_CONN_ID: "google_cloud_default"

I have also enabled workload identity on the pods so any worker pod will have access to write to the bucket with the permissions of Storage Admin.

What you expected to happen:

Ideally the logs should have been written into the storage bucket. With some trial and error, if I switch the executor to Celery, the logs are pushed to the bucket. But with Kubenetes executor the logs upload to the bucket is failing.

How to reproduce it:

Install the helm chart with the env values mentioned above using the command helm upgrade --install "airflow" stable/airflow --version "7.13.0" --namespace "airflow" --values values.yaml

Anything else we need to know:

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
MatteoMart1994commented, Jun 9, 2021

@MatteoMart1994 your experiencing this issue in Composer Version >= 1.15.0?

I ask because it seems like the same issue im seeing on composer-1.16.1-airflow-1.10.15 and the known-issue says resolved in 1.10.4(or later)

You are right. Disabling dag serialization, however, was a work around proposed by Google support itself. No clue on the reason why Dag serialization beaks the Airflow UI

0reactions
mtsadlercommented, Jun 9, 2021

@MatteoMart1994 your experiencing this issue in Composer Version >= 1.15.0?

I ask because it seems like the same issue im seeing on composer-1.16.1-airflow-1.10.15 and the known-issue says resolved in 1.10.4(or later)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Airflow Remote logging not working - Stack Overflow
I manage to get the remote log to GCS. First, you need to give the service account permission to write to GCS bucket....
Read more >
Troubleshoot routing and sinks - Logging - Google Cloud
When routing logs using sinks, one of the following might occur: The sink destination seems to be missing logs. You see a log...
Read more >
Writing logs to Google Cloud Storage - Apache Airflow
Remote logging to Google Cloud Storage uses an existing Airflow connection to read or write logs. If you don't have a connection properly...
Read more >
Locating the remote path in Google cloud
Use the Google bucket name for the Remote Path in the Device Logs Settings. The bucket name is also the name of the...
Read more >
Log streaming: Google Cloud Storage | Fastly Help Guides
You will need to create a new service account on Google's website with the role of Storage Object Creator and make sure you've...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found