question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Evaluator fails with ValueError in Airflow when caching is turned off and retries are allowed

See original GitHub issue

When running a pipeline in Airflow, if enable_cache is set to False, but retries are allowed, a component can run more than once and generate more than one artifact. A downstream component that expects only one artifact will fail with a ValueError because it expects only one artifact.

{{base_task_runner.py:115}} INFO - Job 750: Subtask Evaluator     (len(input_dict[constants.MODEL_KEY])))
{{base_task_runner.py:115}} INFO - Job 750: Subtask Evaluator ValueError: There can be only one candidate model, there are 2.

As a workaround, I’ve been setting enable_cache to True to avoid unexpected behavior due to retries.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:21 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
zhitaolicommented, Jul 30, 2021

I tried to create a reproduction of Airflow retry and similar code pasted as @casassg but with a bit more determinism:

https://gist.github.com/zhitaoli/b2d92f8ad04d98d99974513563149d33

I was able to reproduce the error for once, but after upgrading to tfx 1.0.0 the issue was shadowed by the following error stack:

https://gist.github.com/zhitaoli/7dbaaa42abd8aa78cb54d52a266cd0ee

I’ll dig a bit more with @hughmiao to see whether this is fixable.

Original error might be fixable with https://github.com/tensorflow/tfx/pull/4093 but without fixing above I cannot promise yet.

1reaction
nmelchecommented, Apr 13, 2021

Same issue here with tfx.orchestration.kubeflow.kubeflow_dag_runner.KubeflowDagRunner

Read more comments on GitHub >

github_iconTop Results From Across the Web

Release Notes — Airflow Documentation
The task gets killed and goes into FAILED state. After #16681, clearing a running task sets its state to RESTARTING . The task...
Read more >
Airflow Documentation - Read the Docs
Airflow is a platform to programmatically author, schedule and monitor workflows. Use airflow to author workflows as directed acyclic graphs ...
Read more >
Changelog - Apache Airflow Documentation
Bug Fixes¶. BugFix: Tasks with depends_on_past or task_concurrency are stuck (#12663). Fix issue with empty Resources in executor_config ...
Read more >
System i® and System p®: Reference codes - IBM
turn off and then turn on the system unit. Then, retry the operation. 2. If the problem persists, refer to the actions for...
Read more >
Changelog | Prefect Legacy API Documentation
Deprecate cache_* and result_handler options on Task and Flow objects #2140 ... when eager validation was turned off - #919; Fix issue with...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found