Logs for EmrStepSensor
See original GitHub issueDescription
Add feature to EmrStepSensor to bring back the spark task url & logs after task execution
Use case/motivation
After starting an EMR step task using EmrAddStepsOperator we generally have an EmrStepSensor to track the status of the step. The job ID is available for the sensor and is being poked at regular interval.
[2022-04-26, 22:07:43 UTC] {base_aws.py:100} INFO - Retrieving region_name from Connection.extra_config['region_name']
[2022-04-26, 22:07:44 UTC] {emr.py:316} INFO - Poking step s-123ABC123ABC on cluster j-123ABC123ABC
[2022-04-26, 22:07:44 UTC] {emr.py:74} INFO - Job flow currently PENDING
[2022-04-26, 22:08:44 UTC] {emr.py:316} INFO - Poking step s-123ABC123ABC on cluster j-123ABC123ABC
[2022-04-26, 22:08:44 UTC] {emr.py:74} INFO - Job flow currently PENDING
[2022-04-26, 22:09:44 UTC] {emr.py:316} INFO - Poking step s-123ABC123ABC on cluster j-123ABC123ABC
[2022-04-26, 22:09:44 UTC] {emr.py:74} INFO - Job flow currently COMPLETED
[2022-04-26, 22:09:44 UTC] {base.py:251} INFO - Success criteria met. Exiting.
[2022-04-26, 22:09:44 UTC] {taskinstance.py:1288} INFO - Marking task as SUCCESS. dag_id=datapipeline_sample, task_id=calculate_pi_watch_step, execution_date=20220426T220739, start_date=20220426T220743, end_date=20220426T220944
After the task is completed the status is displayed. If the user wants to review the logs of the task, it is a multistep process to get hold of the job logs from EMR cluster.
It will be a great addition to add the log url and possibly relay the logs to Airflow EmrStepSensor post completion of the task. This will be very handy when there are failures of many tasks and will make it a great user experience.
Related issues
No response
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project’s Code of Conduct
Issue Analytics
- State:
- Created a year ago
- Comments:9 (6 by maintainers)
Top Results From Across the Web
Source code for airflow.contrib.sensors.emr_step_sensor
[docs]class EmrStepSensor(EmrBaseSensor): """ Asks for the state of the step ... EmrHook(aws_conn_id=self.aws_conn_id).get_conn() self.log.info('Poking step ...
Read more >How do I troubleshoot a failed Spark step in Amazon EMR?
For Spark jobs submitted with --deploy-mode cluster: Check the step logs to identify the application ID. Then, check the application master logs ......
Read more >Airflow EMR execute step from Sensor - Stack Overflow
I made the following DAG in airflow where I am executing a set of EMRSteps to run my pipeline. default_args = { 'owner':...
Read more >Bet you didn't know this about Airflow! | by Jyoti Dhiman
Using EMRStepSensor tasks, we wait for the submitted job to complete and the state of this task indicates whether the job is in...
Read more >test_dag.py - Amazon AWS
... from airflow.contrib.sensors.emr_step_sensor import EmrStepSensor from ... emr = EmrHook(aws_conn_id=self.aws_conn_id).get_conn() self.log.info('Poking ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Can I be assigned this issue? Thanks @eladkal 😃
Oh, TIL! @syedahsn you know what to do : )