question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Restarting the installation process can cause certificate problems if K8s was not fully configured

See original GitHub issue

Describe the bug When reruning the epicli installation process one can fail with this error:

failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"etcd-ca\")"

How to reproduce Steps to reproduce the behavior:

  1. execute epicli init ... (with params)
  2. if the installation from step on 1 will fail in between of kubernetes component creation - execute the step no 1 again

Expected behavior On epicli preflight phase there should be a task to check for the existence of /etc/kubernetes/pki/ folder and it should be cleaned if exists to ensure that all the certs from a brand new installation run will be signed with the most current CA cert.

Config files

Environment

  • OS: Ubuntu 18.04.4 LTS

epicli version: 1.0.1


DoD checklist

  • Changelog updated (if affected version was released)
  • COMPONENTS.md updated / doesn’t need to be updated
  • Automated tests passed (QA pipelines)
    • apply
    • upgrade
  • Case covered by automated test (if possible)
  • Idempotency tested
  • Documentation updated / doesn’t need to be updated
  • All conversations in PR resolved
  • Backport tasks created / doesn’t need to be backported

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
atsikhamcommented, Jan 4, 2022

My proposal is to check this task after #2828 as it might be related.

0reactions
przemyslaviccommented, Feb 4, 2022

Tested multiple times apply command after cancelling the build at different stage. It went smoothly on re-apply. The task on which the first build failed may be relevant here. I would close this task and re-open if it occurs again.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Kubelet can't running after renew certificates #2054 - GitHub
After the certificates expire, execute kubeadm alpha certs renew all , and kubelet will not be able to restart normally.
Read more >
Troubleshooting kubeadm | Kubernetes
This page lists some common failure scenarios and have provided steps that can help you understand and fix the problem. If your problem...
Read more >
Kubernetes CrashLoopBackOff Error: What It Is and How to Fix It
1. Check for “Back Off Restarting Failed Container” ... Run kubectl describe pod [name] . If you get a Liveness probe failed and...
Read more >
Common problems | Elastic Cloud on Kubernetes [2.5]
Pods are not replaced after a configuration updateedit. The update of an existing Elasticsearch cluster configuration can fail because the operator is unable...
Read more >
Resolve issues when upgrading AKS hybrid - Microsoft Learn
After a successful upgrade, older versions of PowerShell are not removed ... number of Kubernetes configuration secrets is created on a cluster; Next...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found