[BUG] Restarting the installation process can cause certificate problems if K8s was not fully configured
See original GitHub issueDescribe the bug When reruning the epicli installation process one can fail with this error:
failed to connect to {https://127.0.0.1:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"etcd-ca\")"
How to reproduce Steps to reproduce the behavior:
- execute
epicli init ... (with params)
- if the installation from step on 1 will fail in between of kubernetes component creation - execute the step no 1 again
Expected behavior
On epicli preflight phase there should be a task to check for the existence of /etc/kubernetes/pki/
folder
and it should be cleaned if exists to ensure that all the certs from a brand new installation run will be signed with the most current CA cert.
Config files
Environment
- OS: Ubuntu 18.04.4 LTS
epicli version: 1.0.1
DoD checklist
- Changelog updated (if affected version was released)
- COMPONENTS.md updated / doesn’t need to be updated
- Automated tests passed (QA pipelines)
- apply
- upgrade
- Case covered by automated test (if possible)
- Idempotency tested
- Documentation updated / doesn’t need to be updated
- All conversations in PR resolved
- Backport tasks created / doesn’t need to be backported
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Kubelet can't running after renew certificates #2054 - GitHub
After the certificates expire, execute kubeadm alpha certs renew all , and kubelet will not be able to restart normally.
Read more >Troubleshooting kubeadm | Kubernetes
This page lists some common failure scenarios and have provided steps that can help you understand and fix the problem. If your problem...
Read more >Kubernetes CrashLoopBackOff Error: What It Is and How to Fix It
1. Check for “Back Off Restarting Failed Container” ... Run kubectl describe pod [name] . If you get a Liveness probe failed and...
Read more >Common problems | Elastic Cloud on Kubernetes [2.5]
Pods are not replaced after a configuration updateedit. The update of an existing Elasticsearch cluster configuration can fail because the operator is unable...
Read more >Resolve issues when upgrading AKS hybrid - Microsoft Learn
After a successful upgrade, older versions of PowerShell are not removed ... number of Kubernetes configuration secrets is created on a cluster; Next...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
My proposal is to check this task after #2828 as it might be related.
Tested multiple times
apply
command after cancelling the build at different stage. It went smoothly on re-apply. The task on which the first build failed may be relevant here. I would close this task and re-open if it occurs again.