question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[bug] workflow resource leaking when no run resource found in KFP DB

See original GitHub issue

What steps did you take

  1. connect to kfp-standalone-1 cluster in kfp-ci project

  2. count current workflows – 1252

    kubectl get workflow | wc -l
        1252
    
  3. confirm current workflow ages:

    kubectl get workflow | less
    

What happened:

I found many workflows with age greater than 1d, our configured workflow GC time. Because of the issue, there are too many Pods on each node and crashing GKE metrics server.

What did you expect to happen:

Workflows should be GCed after being persisted to KFP DB.

Environment:

  • How do you deploy Kubeflow Pipelines (KFP)? standalone
  • KFP version: 1.7.0-rc.2

Labels

/area backend

/area testing


Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:3
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
jlicommented, Jun 27, 2022

This is happening more and more frequently for my team - multiple times a week now. It’s disruptive for us, because oncall needs to look up what workflow was lost and notify users that their run is never going to work.

Are there any workarounds we could try?

0reactions
jlicommented, Feb 18, 2022

It was just confusing, and caused some delay for me.

I launched a workflow, then came to check on it hours later, but found that it hadn’t run.

The run details page wouldn’t load (constant spinner over the dag part of the page). I checked the experiment page, and my run was there but with a grey question mark status icon. When I looked at workflow objects on k8s, I couldn’t find anything.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Resource leak detection in Amazon CodeGuru Reviewer
Resource leaks are bugs that arise when a program doesn't release the resources it has acquired. Resource leaks can lead to resource exhaustion....
Read more >
Resource Leaks: Detecting, Locating, and Repairing Your ...
This article introduces three tools I wrote that will help you detect and find the leaking resource. First, for Windows 2000, I present...
Read more >
Database Connection Monitoring and Leak Detection - Joget
At the platform level, Joget Workflow has been tested to ensure that there are no leaks in memory or database connections and other ......
Read more >
News — Rok 1.5.3 documentation
Restructure the “Deploy Rok Registry” guide. Bug Fixes¶. Fix a bug in rok-kf-prune which resulted in it removing resources cert- ...
Read more >
Java Memory Leaks - AppDynamics Documentation
However, because garbage collection does not eliminate memory leaks completely, AppDynamics includes Automatic Leak Detection for supported JVMs.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found