question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[feature] allow jobs to fail

See original GitHub issue

Feature Area

/area backend

What feature would you like to see?

Allow failure of jobs - If an operation fails, do not fail the pipeline. Allow the pipeline to continue to the next stage, and there it may fail if that does not have the pre-requisites.

What is the use case or pain point?

In the machine learning pipelines, it is fairly common to run multiple models, or possibly different configurations of same model, and this possibly runs on a subset of training data. After these are trained, usually they are compared using some metric, the best model is chosen, and that is run on the entire training data to have the final trained model.

If someone uses kfp.dsl.ParallelFor to run the different models, failure in one of them causes the entire pipeline to fail and possibly successful training of other models are lost. But if the next stage, the one to compare using metric supports comparison of the available (i.e. successful) models, the pipeline failure costs the time to train those models, as one have to restart. If we support the requested feature, the failed operations will display an warning (may be ⚠️), and will go on to final training step. Then depending on whether that supports comparison of subset of all models, it will proceed as if the failed models were not there. If not, it’ll fail there.

Very similar functionality in available in few CI tools. For example, Gitlab CI has allow_failure, Travis CI has allow_failures, etc.

Is there a workaround currently?

It is possible to do very broad top level exception handling to suppress failures. However, in this way the fact that it failed is hidden in the logs and not displayed in the pipeline dashboard. In scheduled pipelines where no one really go through the logs of all “successful” pipelines, these failures will go unnoticed.


Love this idea? Give it a 👍. We prioritize fulfilling features with the most 👍.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:6
  • Comments:9 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
jszendrecommented, May 9, 2022

Exit handler ops cannot explicitly depend on any previous operations so they cannot be parameterized by outputs of previous operations or be guaranteed to run after previous steps.

My use case is having integration tests run that are themselves kubeflow pipelines and I would like to be able to verify that a task fails without the integration test failing. Configuring that in the dsl would be a lot cleaner than being included in application logic or directly in ci/cd.

1reaction
marrrcincommented, Mar 7, 2022

@yarnabrina , @chensun I’ve created a pull request implementing this behaviour - I would really appreciate your feedback on that https://github.com/kubeflow/pipelines/pull/7373.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Allow Job Failures | Cloud Feature Requests - CircleCI Ideas
Allow Job Failures. L. Lev. Use Case: We have some Jobs in our workflows, eg linting of code. If these fail the whole...
Read more >
`allow_failure` action for pipeline rules (#30235) - GitLab
This action defines the behavior of job after the rule is matched, allowing us to conditionally set allowed to fail in given conditions....
Read more >
Mark a build "passed with warning" when "allow_failures ...
Travis CI Discussions & Feedback Feature Requests ... want it to be a no-brainer “pass” or “fail”, but if there are hidden allowed...
Read more >
Gitlab-CI: Specify that Job C should run after Job B if Job A ...
when: on-failure (Job A or Job B could failed, but only Job A is important) · when: always (maybe Job A failed which...
Read more >
Diagnose Failed Jobs
NOTE: Designer Cloud Powered by Trifacta Educational is a free product with limitations on its features. Some features in the documentation do ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found