ECSOperator returns last logs when ECS task fails
See original GitHub issueDescription
Currently when the ECSOperator fails because the ECS task is not in ‘success’ state it returns a generic message like that in Airflow alerts that doesn’t have much value when we want to debug things quickly.
This task is not in success state {<huge JSON from AWS containing all the ECS task details>}
Use case / motivation
This is to make it faster for people to fix an issue when a task running ECSOperator fails.
Proposal
The idea would be to return instead the last lines of logs from Cloudwatch (that are printed above in Airflow logs) so when we receive the alert we know what failed in the ECS task instead of having to go to Airflow logs to find it. This feature would involve changes there I think:
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:7 (7 by maintainers)
Top Results From Across the Web
Troubleshoot ECS tasks stopping or failing to start
To identify why your tasks stopped, follow these troubleshooting steps: Check for diagnostic information in the service event log.
Read more >Source code for airflow.providers.amazon.aws.operators.ecs
This is to avoid relaunching a new task when the connection drops between Airflow and ECS while the task is running (when the...
Read more >Displaying ECS Fargate logs in Airflow UI - The swamp
So luckily Airflow ECS Operator supports displaying those logs from the Fargate task within ... and an Airflow task is marked as failed....
Read more >EcsOperator - Astronomer Registry
Execute a task on AWS ECS (Elastic Container Service) ... from the last Cloudwatch logs to return in the AirflowException if an ECS...
Read more >Airflow ECS-Operator not fetching CloudWatch Logs
If I understand correctly by default fetch_interval is 30 seconds and the worker will fetch the log anyway when the task is complete/fails....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
We can probably use
collections.deque
for better performance, but aside from that, sounds good to me! Would you be interested in opening a pull request for this?p.s. I don’t think
IndexError
can ever be raised?Assigned you @pmalafosse