"Failed to load task run error" occur in "on_failure_callback" after upgrading from 2.0.0 to 2.1.2
See original GitHub issueApache Airflow version: 2.1.2
Kubernetes version (if you are using kubernetes) (use kubectl version
): “v1.19.6”
Environment:
- Cloud provider or hardware configuration:
- OS (e.g. from /etc/os-release):
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
- Kernel (e.g.
uname -a
): - Install tools:
- Others:
What happened: “Failed to load task run error” occur in “on_failure_callback” after task failed
What you expected to happen: The actually exception should be caught
How to reproduce it:
def validate_parameters(**kwargs):
conf = kwargs.get('dag_run').conf
kwargs['ti'].xcom_push(key='conf', value=conf)
job_identifier = conf.get('job_identifier')
s3_connection_id = conf.get('s3_connection_id')
if not job_identifier:
raise ParameterValidationException(reason='job_identifier should not be empty')
if not s3_connection_id:
raise ParameterValidationException(reason='s3_connection_id should not be empty')
def handle_failure(context):
exception = context.get('exception')
// the ParameterValidationException should be caught here, but I got "Failed to load task run error"
// other business
DEFAULT_ARGS = {
'owner': 'boss_admin',
'start_date': datetime(2021, 1, 2),
"retries": 0,
'on_failure_callback': handle_failure
}
Anything else we need to know:
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (3 by maintainers)
Top Results From Across the Web
[GitHub] [airflow] potiuk commented on issue #17045
... potiuk commented on issue #17045: "Failed to load task run error" occur in "on_failure_callback" after upgrading from 2.0.0 to 2.1.2.
Read more >Failed to load task (Execute Processes Task)
Created a job on SQL Server 2012 which runs the upgraded package from ... the package is an EXECUTE PROCESS TASK, it says...
Read more >airflow - How is exception passed to on_failure_callback?
You can define on_failure_callback on the DAG and on the task level. Exceptions are only passed to the failure callback on the task...
Read more >Callbacks — Airflow Documentation
A valuable component of logging and monitoring is the use of task callbacks to act upon changes in state of a given task,...
Read more >LM_44127 Failed to prepare the task (Session task instance)
I have problems to start any workflow in Informatica PowerCenter. Do you have any idea what may be wrong? When we started workflow,...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Bottom Line Up Front: I had a similar issue, and found that the cause was a third party packages code that did not implement exceptions that could be loaded from a picklefile. This is the most likely cause of the problem.
For me, switching to Airflows official helm wasn’t an option (not running Kubernetes). I did some digging around and found that this message is spawned when trying to load an original exception from a pickle file. https://github.com/apache/airflow/blob/9922287a4f9f70b57635b04436ddc4cfca0e84d2/airflow/models/taskinstance.py#L136-L147
Unfortunately the causing error message is not logged. I applied this patch to my instance of airflow to see what error was causing the pickle.loads to fail.
I found that the error was being caused because the third party tooling I was using had an error that could be pickled, but then not loaded back. This is a common issue with custom errors in python. I’m guessing that something similar is happening in your case.
@Calder-Ty Thanks a lot, I have fixed my issue by adding
def __reduce__(self): return ParameterValidationException, (self.reason, self.component)
to my custom Exception, as the pickle library needs to unpicle data from log files. https://www.synopsys.com/blogs/software-security/python-pickling/#:~:text=Whenever an object is pickled,reconstruct this object when unpickling.