S3 Remote Logging Kubernetes Executor worker task keeps waiting to send log: "acquiring 0"
See original GitHub issueApache Airflow version: 2.0
Kubernetes version (if you are using kubernetes) (use kubectl version
): v1.16.15
Environment:
- Cloud provider or hardware configuration: AWS
- OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a
): - Install tools: Airflow Helm Chart
- Others:
What happened:
A running task in a worker created from the Kubernetes Executor is constantly running with no progress being made. I checked the log and I see that it is “stuck” with a [2021-02-05 01:07:17,316] {utils.py:580} DEBUG - Acquiring 0
I see it is able to talk to S3, in particular it does a HEAD request to see if the key exists in S3, and I get a 404, which means the object does not exist in S3. And then, the logs just stop and seems to be waiting. No more logs show up about what is going on.
I am using an access point for the s3 remote base log folder, and that works in Airflow 1.10.14.
Running the following, a simple dag that should just prints a statement:
airflow@testdag2dbdbscouter-b7f961ff64d6490e80c5cfa2fd33a37c:/opt/airflow$ airflow tasks run test_dag-2 dbdb-scouter now --local --pool default_pool --subdir /usr/airflow/dags/monitoring_and_alerts/test_dag2.py
[2021-02-05 02:04:22,962] {settings.py:208} DEBUG - Setting up DB connection pool (PID 185)
[2021-02-05 02:04:22,963] {settings.py:279} DEBUG - settings.prepare_engine_args(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=185
[2021-02-05 02:04:23,164] {cli_action_loggers.py:40} DEBUG - Adding <function default_action_log at 0x7f0984c30290> to pre execution callback
[2021-02-05 02:04:30,379] {cli_action_loggers.py:66} DEBUG - Calling callbacks: [<function default_action_log at 0x7f0984c30290>]
[2021-02-05 02:04:30,499] {settings.py:208} DEBUG - Setting up DB connection pool (PID 185)
[2021-02-05 02:04:30,499] {settings.py:241} DEBUG - settings.prepare_engine_args(): Using NullPool
[2021-02-05 02:04:30,500] {dagbag.py:440} INFO - Filling up the DagBag from /usr/airflow/dags/monitoring_and_alerts/test_dag2.py
[2021-02-05 02:04:30,500] {dagbag.py:279} DEBUG - Importing /usr/airflow/dags/monitoring_and_alerts/test_dag2.py
[2021-02-05 02:04:30,511] {dagbag.py:405} DEBUG - Loaded DAG <DAG: test_dag-2>
[2021-02-05 02:04:30,567] {plugins_manager.py:270} DEBUG - Loading plugins
[2021-02-05 02:04:30,567] {plugins_manager.py:207} DEBUG - Loading plugins from directory: /opt/airflow/plugins
[2021-02-05 02:04:30,567] {plugins_manager.py:184} DEBUG - Loading plugins from entrypoints
[2021-02-05 02:04:30,671] {plugins_manager.py:414} DEBUG - Integrate DAG plugins
Running <TaskInstance: test_dag-2.dbdb-scouter 2021-02-05T02:04:23.265117+00:00 [None]> on host testdag2dbdbscouter-b7f961ff64d6490e80c5cfa2fd33a37c
If I check the logs directory, and open the log, I see that the log
[2021-02-05 01:07:17,314] {retryhandler.py:187} DEBUG - No retry needed.
[2021-02-05 01:07:17,314] {hooks.py:210} DEBUG - Event needs-retry.s3.HeadObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f3e27182b80>>
[2021-02-05 01:07:17,314] {utils.py:1186} DEBUG - S3 request was previously to an accesspoint, not redirecting.
[2021-02-05 01:07:17,316] {utils.py:580} DEBUG - Acquiring 0
If I do a manual keyboard interrupt to terminate the running task, I see the following:
[2021-02-05 02:11:30,103] {hooks.py:210} DEBUG - Event needs-retry.s3.HeadObject: calling handler <botocore.retryhandler.RetryHandler object at 0x7f097f048110>
[2021-02-05 02:11:30,103] {retryhandler.py:187} DEBUG - No retry needed.
[2021-02-05 02:11:30,103] {hooks.py:210} DEBUG - Event needs-retry.s3.HeadObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f097f0293d0>>
[2021-02-05 02:11:30,103] {utils.py:1187} DEBUG - S3 request was previously to an accesspoint, not redirecting.
[2021-02-05 02:11:30,105] {utils.py:580} DEBUG - Acquiring 0
[2021-02-05 02:11:30,105] {futures.py:277} DEBUG - TransferCoordinator(transfer_id=0) cancel(cannot schedule new futures after interpreter shutdown) called
[2021-02-05 02:11:30,105] {s3_task_handler.py:193} ERROR - Could not write logs to s3://arn:aws:s3:us-west-2:<ACCOUNT>:accesspoint:<BUCKET,PATH>/2021-02-05T02:04:23.265117+00:00/1.log
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/log/s3_task_handler.py", line 190, in s3_write
encrypt=conf.getboolean('logging', 'ENCRYPT_S3_LOGS'),
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 61, in wrapper
return func(*bound_args.args, **bound_args.kwargs)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 90, in wrapper
return func(*bound_args.args, **bound_args.kwargs)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 547, in load_string
self._upload_file_obj(file_obj, key, bucket_name, replace, encrypt, acl_policy)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 638, in _upload_file_obj
client.upload_fileobj(file_obj, bucket_name, key, ExtraArgs=extra_args)
File "/home/airflow/.local/lib/python3.7/site-packages/boto3/s3/inject.py", line 538, in upload_fileobj
extra_args=ExtraArgs, subscribers=subscribers)
File "/home/airflow/.local/lib/python3.7/site-packages/s3transfer/manager.py", line 313, in upload
call_args, UploadSubmissionTask, extra_main_kwargs)
File "/home/airflow/.local/lib/python3.7/site-packages/s3transfer/manager.py", line 471, in _submit_transfer
main_kwargs=main_kwargs
File "/home/airflow/.local/lib/python3.7/site-packages/s3transfer/futures.py", line 467, in submit
future = ExecutorFuture(self._executor.submit(task))
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 165, in submit
raise RuntimeError('cannot schedule new futures after '
RuntimeError: cannot schedule new futures after interpreter shutdown
My Airflow config:
[logging]
base_log_folder = /opt/airflow/logs
remote_logging = True
remote_log_conn_id = S3Conn
google_key_path =
remote_base_log_folder = s3://arn:aws:s3:us-west-2:<ACCOUNT>:accesspoint:<BUCKET>/logs
encrypt_s3_logs = False
logging_level = DEBUG
fab_logging_level = WARN
What you expected to happen:
I expected for the log to be sent to S3, but the task just waits at this point.
How to reproduce it:
Extended the docker image, and baked in the test dag:
FROM apache/airflow:2.0.0-python3.7
COPY requirements.txt /requirements.txt
RUN pip install --user -r /requirements.txt
ENV AIRFLOW_DAG_FOLDER="/usr/airflow"
COPY --chown=airflow:root ./airflow ${AIRFLOW_DAG_FOLDER}
Anything else we need to know:
Issue Analytics
- State:
- Created 3 years ago
- Reactions:11
- Comments:6 (1 by maintainers)
Top GitHub Comments
I tried to apply the fix suggested by @dacort (update the S3 hook to not use threads): the worker pod terminates instead of keeps waiting and the log files are correctly pushed to S3.
Another person (from Apache slack channel) also described the exact issue I saw: https://apache-airflow.slack.com/archives/CSS36QQS1/p1612336991103900