question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to undeploy app after deployment with failing healthcheck

See original GitHub issue

Hello, I’m running SCDF on k8s with helm and found issue looking to me as a bug.

Steps to reproduce:

  1. Create simplest stream with time and log : “time | log”
  2. Change liveness-probe-path and readiness-probe-path to value that intentionally breaks healthcheck for time app.
  3. Update stream.
  4. Initial version of time app is destroyed and new one is started. After couple of minutes app’s state is changed to ‘failed’ and stream’s status to ‘partial’ (you can try to perform this step couple of times to generate more failing app versions). 5*. If at this stage you try to undeploy the stream or destroy it both deployments for time and log apps are deleted - this is expected (don’t try to remove stream when trying to reproduce the issue, I mentioned this step only to show that deletion of the stream at this stage works and SCDF can delete failing app).
  5. For initial stream existing since step 1 change values for liveness-probe-path and readiness-probe-path back for time app to normal. New version of time is started.
  6. Previous version is still trying to start, running into crashloopbackoff.
  7. Now destroy, or undeploy the stream. Deployments of healthy apps are deleted, but failing ones are preserved. Also tried stream all destroy --force - same results.

If multiple updates are made and app cannot start, all these versions are preserved after stream is destroyed, also occupying resources of k8s cluster. Only manual deletion with kubectl delete deployment appDeploymentName helps.

In skipper logs I can also observe such lines:

2020-07-13 13:40:57.974 INFO 1 — [eTaskExecutor-2] o.s.c.s.s.d.s.HandleHealthCheckStep : Release testTimeLogStream-v2 has been DEPLOYED 2020-07-13 13:40:57.974 INFO 1 — [eTaskExecutor-2] o.s.c.s.s.d.s.HandleHealthCheckStep : Apps in release testTimeLogStream-v2 are healthy. 2020-07-13 13:40:57.984 INFO 1 — [eTaskExecutor-2] o.s.c.s.s.d.s.HandleHealthCheckStep : Deleting changed applications from existing release testTimeLogStream-v1 2020-07-13 13:40:57.995 WARN 1 — [eTaskExecutor-2] o.s.c.s.s.d.strategies.DeleteStep : For Release name testTimeLogStream, did not undeploy existing app time as its status is not ‘deployed’.

So it looks like it’s expected situation with current DeleteStep logic. However to me it looks like bug, as there are not cleaned resources left in k8s but from SCDF perspective stream’s health gets back to healthy.

Let me know if I need to provide more details. But setup is pretty basic and the issue can be reproduced easily.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
jvalkealcommented, Aug 6, 2020

Better have these as a separate issues vs. #966

0reactions
tsimur-abayeucommented, Sep 18, 2020

Hey, @jvalkeal ! Do I understand correctly, that the fix most likely will be delivered in 2.7.0.M1? As #972 and #975 mention it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

BOSH health check failed to deploy ... - Pivotal community
This error is thrown because the `bosh-health-check` app does not have permissions to delete the existing deployment.
Read more >
Why am I getting a Deploy Error: Health Checks? - DigitalOcean
When I deploy a new application on the App Platform, I see the Building… progress indicator but after a while it errors out...
Read more >
are updated health checks causing App Engine deployment to ...
This is usually caused when the application is still reading from the legacy health check flags and/or deploying the app using gcloud app...
Read more >
HTTP Health Checks Failed - Aptible
If your app crashes immediately upon start up, it's not healthy. In this case, Aptible will indicate that your Containers exited, and report...
Read more >
Failing Health Checks after Deploy - Questions / Help - Fly.io
I'm able to successfully build and deploy my gql app from a dockerfile. In the logs I see my server success starts and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found