Evaluator fails with ValueError in Airflow when caching is turned off and retries are allowed
See original GitHub issueWhen running a pipeline in Airflow, if enable_cache
is set to False
, but retries are allowed, a component can run more than once and generate more than one artifact. A downstream component that expects only one artifact will fail with a ValueError
because it expects only one artifact.
{{base_task_runner.py:115}} INFO - Job 750: Subtask Evaluator (len(input_dict[constants.MODEL_KEY])))
{{base_task_runner.py:115}} INFO - Job 750: Subtask Evaluator ValueError: There can be only one candidate model, there are 2.
As a workaround, I’ve been setting enable_cache
to True
to avoid unexpected behavior due to retries.
Issue Analytics
- State:
- Created 3 years ago
- Comments:21 (10 by maintainers)
Top Results From Across the Web
Release Notes — Airflow Documentation
The task gets killed and goes into FAILED state. After #16681, clearing a running task sets its state to RESTARTING . The task...
Read more >Airflow Documentation - Read the Docs
Airflow is a platform to programmatically author, schedule and monitor workflows. Use airflow to author workflows as directed acyclic graphs ...
Read more >Changelog - Apache Airflow Documentation
Bug Fixes¶. BugFix: Tasks with depends_on_past or task_concurrency are stuck (#12663). Fix issue with empty Resources in executor_config ...
Read more >System i® and System p®: Reference codes - IBM
turn off and then turn on the system unit. Then, retry the operation. 2. If the problem persists, refer to the actions for...
Read more >Changelog | Prefect Legacy API Documentation
Deprecate cache_* and result_handler options on Task and Flow objects #2140 ... when eager validation was turned off - #919; Fix issue with...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I tried to create a reproduction of Airflow retry and similar code pasted as @casassg but with a bit more determinism:
https://gist.github.com/zhitaoli/b2d92f8ad04d98d99974513563149d33
I was able to reproduce the error for once, but after upgrading to tfx 1.0.0 the issue was shadowed by the following error stack:
https://gist.github.com/zhitaoli/7dbaaa42abd8aa78cb54d52a266cd0ee
I’ll dig a bit more with @hughmiao to see whether this is fixable.
Original error might be fixable with https://github.com/tensorflow/tfx/pull/4093 but without fixing above I cannot promise yet.
Same issue here with tfx.orchestration.kubeflow.kubeflow_dag_runner.KubeflowDagRunner