AirflowException: Celery command failed on host: [...]
See original GitHub issueApache Airflow version: 2.1.0
Kubernetes version (if you are using kubernetes) (use kubectl version
): N/A (ECS)
Environment: Production
- Cloud provider or hardware configuration: AWS ECS Fargate
- OS (e.g. from /etc/os-release): Ubuntu
- Kernel (e.g.
uname -a
): - Install tools: poetry
- Others:
What happened:
Since Airflow 2.x we’ve seen Celery-related failures with exceptions like this being the most prominent: AirflowException: Celery command failed on host: ip-10-0-10-110.us-west-2.compute.internal
.
What you expected to happen:
No Celery command failures during normal operation.
How to reproduce it:
Deploy an Airflow cluster with a Celery + Redis executor to an ECS Fargate cluster.
Anything else we need to know:
How often does this problem occur? Once? Every time etc?
~30 times per hour.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:5
- Comments:16 (9 by maintainers)
Top Results From Across the Web
Celery command failed - The recorded hostname does not ...
The hostname is set when the task instance runs, and is set to self.hostname = socket.getfqdn() , where socket is the python package...
Read more >Task failed without any logs -> AirflowException
Using Flower opening the web dashboard and picking one the failed task, I was able to see the following exception: AirflowException: Celery command...
Read more >Source code for airflow.executors.celery_executor
To start the celery worker, run the command: airflow celery worker """ if ... 0) if ret == 0: return msg = f"Celery...
Read more >airflow.executors.celery_executor - PythonHosted.org
CalledProcessError as e: logging.error(e) raise AirflowException('Celery command failed'). [docs]class CeleryExecutor(BaseExecutor): """ CeleryExecutor is ...
Read more >apache/incubator-airflow - Gitter
[2018-08-20 11:46:16,196] {celery_executor.py:54} ERROR - Command 'airflow run ... raise AirflowException('Celery command failed') AirflowException: Celery ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Update: My particular issue is fixed in https://github.com/apache/airflow/pull/16860
How can I fix this? I’m using Airflow 2.1.2 on CentOS7 in a federated topology and all jobs failing.
Root cause is fact TASK args includes wrong
--subdir
valuer, passing value of Scheduler’s ${DAGS_HOME} instead of Workers ${DAG_HOME}. Scheduler and Worker are different host/domains so ${DAG_HOME} cannot be identical. Indagbag.py
usingsettings.DAGS_FOLDER
would fix the problem I think:airflow/models/dagbag.py:122 dag_folder = dag_folder or settings.DAGS_FOLDER
Error
CAUSE: Wrong
--subdir
value$ airflow tasks test touch_file_mytest runme 2021-08-06T09:03:16.244034+00:00 --subdir /home/_airflowservice@schedulerdomain/dags/mytest/example.py
FIX: DROP INCORRECT
--subdir
arg$ airflow tasks test touch_file_mytest runme 2021-08-06T09:03:16.244034+00:00
I’m dealing with this too (but mine is 100% of the time on one system only).
After researching it looks like this is an old issue that seems to keep popping up after being “fixed”. These resources haven’t fixed my problem but there are some ideas that might be relevant to your setup:
I suspect you are also seeing Airflow JIRA issue If a task crashes, hostname not committed to db so logs not in the UI which is a symptom of the
Celery command failed on host
? I am.Do you have one worker failing consistenly among other workers suceesding or are your failures on a worker which is also succeeding sometimes?