question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

S3 Remote Logging Kubernetes Executor worker task keeps waiting to send log: "acquiring 0"

See original GitHub issue

Apache Airflow version: 2.0

Kubernetes version (if you are using kubernetes) (use kubectl version): v1.16.15

Environment:

  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools: Airflow Helm Chart
  • Others:

What happened:

A running task in a worker created from the Kubernetes Executor is constantly running with no progress being made. I checked the log and I see that it is “stuck” with a [2021-02-05 01:07:17,316] {utils.py:580} DEBUG - Acquiring 0

I see it is able to talk to S3, in particular it does a HEAD request to see if the key exists in S3, and I get a 404, which means the object does not exist in S3. And then, the logs just stop and seems to be waiting. No more logs show up about what is going on.

I am using an access point for the s3 remote base log folder, and that works in Airflow 1.10.14.

Running the following, a simple dag that should just prints a statement:

airflow@testdag2dbdbscouter-b7f961ff64d6490e80c5cfa2fd33a37c:/opt/airflow$ airflow tasks run test_dag-2 dbdb-scouter now --local --pool default_pool --subdir /usr/airflow/dags/monitoring_and_alerts/test_dag2.py
[2021-02-05 02:04:22,962] {settings.py:208} DEBUG - Setting up DB connection pool (PID 185)
[2021-02-05 02:04:22,963] {settings.py:279} DEBUG - settings.prepare_engine_args(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=185
[2021-02-05 02:04:23,164] {cli_action_loggers.py:40} DEBUG - Adding <function default_action_log at 0x7f0984c30290> to pre execution callback
[2021-02-05 02:04:30,379] {cli_action_loggers.py:66} DEBUG - Calling callbacks: [<function default_action_log at 0x7f0984c30290>]
[2021-02-05 02:04:30,499] {settings.py:208} DEBUG - Setting up DB connection pool (PID 185)
[2021-02-05 02:04:30,499] {settings.py:241} DEBUG - settings.prepare_engine_args(): Using NullPool
[2021-02-05 02:04:30,500] {dagbag.py:440} INFO - Filling up the DagBag from /usr/airflow/dags/monitoring_and_alerts/test_dag2.py
[2021-02-05 02:04:30,500] {dagbag.py:279} DEBUG - Importing /usr/airflow/dags/monitoring_and_alerts/test_dag2.py
[2021-02-05 02:04:30,511] {dagbag.py:405} DEBUG - Loaded DAG <DAG: test_dag-2>
[2021-02-05 02:04:30,567] {plugins_manager.py:270} DEBUG - Loading plugins
[2021-02-05 02:04:30,567] {plugins_manager.py:207} DEBUG - Loading plugins from directory: /opt/airflow/plugins
[2021-02-05 02:04:30,567] {plugins_manager.py:184} DEBUG - Loading plugins from entrypoints
[2021-02-05 02:04:30,671] {plugins_manager.py:414} DEBUG - Integrate DAG plugins
Running <TaskInstance: test_dag-2.dbdb-scouter 2021-02-05T02:04:23.265117+00:00 [None]> on host testdag2dbdbscouter-b7f961ff64d6490e80c5cfa2fd33a37c

If I check the logs directory, and open the log, I see that the log

[2021-02-05 01:07:17,314] {retryhandler.py:187} DEBUG - No retry needed.
[2021-02-05 01:07:17,314] {hooks.py:210} DEBUG - Event needs-retry.s3.HeadObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f3e27182b80>>
[2021-02-05 01:07:17,314] {utils.py:1186} DEBUG - S3 request was previously to an accesspoint, not redirecting.
[2021-02-05 01:07:17,316] {utils.py:580} DEBUG - Acquiring 0

If I do a manual keyboard interrupt to terminate the running task, I see the following:

[2021-02-05 02:11:30,103] {hooks.py:210} DEBUG - Event needs-retry.s3.HeadObject: calling handler <botocore.retryhandler.RetryHandler object at 0x7f097f048110>
[2021-02-05 02:11:30,103] {retryhandler.py:187} DEBUG - No retry needed.
[2021-02-05 02:11:30,103] {hooks.py:210} DEBUG - Event needs-retry.s3.HeadObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f097f0293d0>>
[2021-02-05 02:11:30,103] {utils.py:1187} DEBUG - S3 request was previously to an accesspoint, not redirecting.
[2021-02-05 02:11:30,105] {utils.py:580} DEBUG - Acquiring 0
[2021-02-05 02:11:30,105] {futures.py:277} DEBUG - TransferCoordinator(transfer_id=0) cancel(cannot schedule new futures after interpreter shutdown) called
[2021-02-05 02:11:30,105] {s3_task_handler.py:193} ERROR - Could not write logs to s3://arn:aws:s3:us-west-2:<ACCOUNT>:accesspoint:<BUCKET,PATH>/2021-02-05T02:04:23.265117+00:00/1.log
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/log/s3_task_handler.py", line 190, in s3_write
    encrypt=conf.getboolean('logging', 'ENCRYPT_S3_LOGS'),
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 61, in wrapper
    return func(*bound_args.args, **bound_args.kwargs)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 90, in wrapper
    return func(*bound_args.args, **bound_args.kwargs)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 547, in load_string
    self._upload_file_obj(file_obj, key, bucket_name, replace, encrypt, acl_policy)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 638, in _upload_file_obj
    client.upload_fileobj(file_obj, bucket_name, key, ExtraArgs=extra_args)
  File "/home/airflow/.local/lib/python3.7/site-packages/boto3/s3/inject.py", line 538, in upload_fileobj
    extra_args=ExtraArgs, subscribers=subscribers)
  File "/home/airflow/.local/lib/python3.7/site-packages/s3transfer/manager.py", line 313, in upload
    call_args, UploadSubmissionTask, extra_main_kwargs)
  File "/home/airflow/.local/lib/python3.7/site-packages/s3transfer/manager.py", line 471, in _submit_transfer
    main_kwargs=main_kwargs
  File "/home/airflow/.local/lib/python3.7/site-packages/s3transfer/futures.py", line 467, in submit
    future = ExecutorFuture(self._executor.submit(task))
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 165, in submit
    raise RuntimeError('cannot schedule new futures after '
RuntimeError: cannot schedule new futures after interpreter shutdown

My Airflow config:

[logging]
base_log_folder = /opt/airflow/logs
remote_logging = True
remote_log_conn_id = S3Conn
google_key_path =
remote_base_log_folder = s3://arn:aws:s3:us-west-2:<ACCOUNT>:accesspoint:<BUCKET>/logs
encrypt_s3_logs = False
logging_level = DEBUG
fab_logging_level = WARN

What you expected to happen:

I expected for the log to be sent to S3, but the task just waits at this point.

How to reproduce it:

Extended the docker image, and baked in the test dag:

FROM apache/airflow:2.0.0-python3.7

COPY requirements.txt /requirements.txt
RUN pip install --user -r /requirements.txt

ENV AIRFLOW_DAG_FOLDER="/usr/airflow"
COPY --chown=airflow:root ./airflow ${AIRFLOW_DAG_FOLDER}

Anything else we need to know:

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:11
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
giorusscommented, Feb 11, 2021

I tried to apply the fix suggested by @dacort (update the S3 hook to not use threads): the worker pod terminates instead of keeps waiting and the log files are correctly pushed to S3.

1reaction
chrismedcommented, Feb 6, 2021

Another person (from Apache slack channel) also described the exact issue I saw: https://apache-airflow.slack.com/archives/CSS36QQS1/p1612336991103900

Read more comments on GitHub >

github_iconTop Results From Across the Web

kubernetes - Airflow doesn't recognise my S3 Connection setting
I am pretty certain this issue is because the s3 logging configuration has not been set on the worker pods. The worker pods...
Read more >
Using pod templates - Amazon EMR - AWS Documentation
You can define how to run Spark jobs on shared EKS clusters by using pod templates with Amazon EMR on EKS and save...
Read more >
List of Configuration Properties - Alluxio v2.9.0 (stable ...
Property Name Default Description alluxio.conf.dynamic.update.enabled false Whether to support dynamic update pro... alluxio.fuse.fs.name alluxio‑fuse The FUSE file system name. alluxio.fuse.special.command.enabled false
Read more >
Configuring Strimzi (0.32.0)
Configuration overview. Strimzi simplifies the process of running Apache Kafka in a Kubernetes cluster. This guide describes how to configure and manage a ......
Read more >
How to Debug Kubernetes Pending Pods and Scheduling ...
If the scheduler is unsuccessful, you will see a FailedScheduling message similar to this: 0/1 nodes are available: 1 node(s) didn't match node ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found