question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Tasks stuck indefinitely when following container logs

See original GitHub issue

Apache Airflow version

2.2.4

What happened

I observed that some workers hanged randomly after being running. Also, logs were not being reported. After some time, the pod status was on “Completed” when inspecting from k8s api, but wasn’t on Airflow, which showed “status:running” for the pod. After some investigation, the issue is in the new kubernetes pod operator and is dependant of a current issue in the kubernetes api.

When a log rotate event occurs in kubernetes, the stream we consume on fetch_container_logs(follow=True,…) is no longer being feeded.

Therefore, the k8s pod operator hangs indefinetly at the middle of the log. Only a sigterm could terminate it as logs consumption is blocking execute() to finish.

Ref to the issue in kubernetes: https://github.com/kubernetes/kubernetes/issues/59902

Linking to https://github.com/apache/airflow/issues/12103 for reference, as the result is more or less the same for end user (although the root cause is different)

What you think should happen instead

Pod operator should not hang. Pod operator could follow the new logs from the container - this is out of scope of airflow as ideally the k8s api does it automatically.

Solution proposal

I think there are many possibilities to walk-around this from airflow-side to not hang indefinitely (like making fetch_container_logs non-blocking for execute and instead always block until status.phase.completed as it’s currently done when get_logs is not true).

How to reproduce

Running multiple tasks will sooner or later trigger this. Also, one can configure a more aggressive logs rotation in k8s so this race is triggered more often.

Operating System

Debian GNU/Linux 11 (bullseye)

Versions of Apache Airflow Providers

apache-airflow==2.2.4
apache-airflow-providers-google==6.4.0
apache-airflow-providers-cncf-kubernetes==3.0.2

However, this should be reproducible with master.

Deployment

Official Apache Airflow Helm Chart

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:2
  • Comments:15 (13 by maintainers)

github_iconTop GitHub Comments

1reaction
potiukcommented, May 9, 2022

Cool. Assigned you 😃 !

1reaction
schattiancommented, May 9, 2022

@potiuk sure, I will submit one one of these days.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot Amazon ECS tasks stuck in the PENDING state
Some common scenarios that can cause your ECS task to be stuck in the PENDING state include the following: The Docker daemon is...
Read more >
Tomcat docker container logs hangs after few hours
I am using tomcat:9.0-jre8-alpine image to deploy my application. when i run the below command it works perfectly and displays logs. docker logs...
Read more >
'docker service logs' seems to hang after printing large output ...
If I run it in background and try to docker logs --follow it, it works well ... verify: Waiting 2 seconds to verify...
Read more >
FAQS about Rancher Server
Running docker logs on the Rancher server container will provide a set of the ... Most likely there are some tasks that are...
Read more >
How to Live Tail Docker Logs - Papertrail
Imagine we're running a container and want to access the logs for this container. How can we accomplish this task? First, we can...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found