question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Kubectl client outside of HA/multi-master Epiphany cluster fails to connect to server with invalid certificate

See original GitHub issue

Describe the bug On a HA / multi-master, issuing kubectl commands from a machine outside the cluster (e.g. CI agent) will sometime fail with a certificate error. The thought is that the HAProxy on the k8s master machines ends up routing the kubectl in a way that mismatches with the config on the external machine.

To Reproduce Steps to reproduce the behavior:

  1. Build an Epiphany cluster with HA / multi-master (3 masters in this case)
  2. Copy the kube config from one of the k8s master machines to an external machine (as part of this, localhost needs to be replaced in the kube config)
  3. Issue kubectl commands from the external machine, which will fail periodically (depending on how traffic is routed)

Expected behavior It should be possible to issue kubectl commands from the external machine that work consistently.

Config files Key aspects of the config are:

components:
    kubernetes_master:
      count: 3
...
use_ha_control_plane: true

OS (please complete the following information):

  • OS: Ubuntu

Cloud Environment (please complete the following information):

  • Cloud Provider: MS Azure

Additional context Add any other context about the problem here.

cc @jsmith085 @sunshine69

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
przemyslaviccommented, Aug 25, 2020

I did some testing by following the instructions posted here to reproduce the issue. I deployed an HA cluster with public IP addresses on Azure, then logged into one machine (other than the master/node), copied admin.conf from one of the masters, replaced localhost with the private IP address of the master node, and now try to run kubectl. I am getting the same error that is described in this task. The support for public IPs will probably be removed here for security reasons, but I think @atsikham will be able to provide more details about the fix. The result of the kubectl command:

NAME                                            STATUS   ROLES    AGE   VERSION
ci-devhaazurubuflannel-kubernetes-master-vm-0   Ready    master   21h   v1.18.6
ci-devhaazurubuflannel-kubernetes-master-vm-1   Ready    master   22h   v1.18.6
ci-devhaazurubuflannel-kubernetes-master-vm-2   Ready    master   22h   v1.18.6
ci-devhaazurubuflannel-kubernetes-node-vm-0     Ready    <none>   21h   v1.18.6
ci-devhaazurubuflannel-kubernetes-node-vm-1     Ready    <none>   21h   v1.18.6
ci-devhaazurubuflannel-kubernetes-node-vm-2     Ready    <none>   21h   v1.18.6
[operations@ci-devhaazurubuflannel-logging-vm-0 ~]$ kubectl get nodes
Unable to connect to the server: x509: certificate is valid for 10.96.0.1, 10.1.1.9, 51.xx.yy.72, 51.xx.yy.71, 51.xx.yy.68, 127.0.0.1, not 10.1.1.6
[operations@ci-devhaazurubuflannel-logging-vm-0 ~]$ kubectl get nodes
Unable to connect to the server: x509: certificate is valid for 10.96.0.1, 10.1.1.7, 127.0.0.1, 51.xx.yy.72, 51.xx.yy.71, 51.xx.yy.68, not 10.1.1.6
[operations@ci-devhaazurubuflannel-logging-vm-0 ~]$ kubectl get nodes
NAME                                            STATUS   ROLES    AGE   VERSION
ci-devhaazurubuflannel-kubernetes-master-vm-0   Ready    master   21h   v1.18.6
ci-devhaazurubuflannel-kubernetes-master-vm-1   Ready    master   22h   v1.18.6
ci-devhaazurubuflannel-kubernetes-master-vm-2   Ready    master   22h   v1.18.6
ci-devhaazurubuflannel-kubernetes-node-vm-0     Ready    <none>   21h   v1.18.6
ci-devhaazurubuflannel-kubernetes-node-vm-1     Ready    <none>   21h   v1.18.6
ci-devhaazurubuflannel-kubernetes-node-vm-2     Ready    <none>   21h   v1.18.6
0reactions
przemyslaviccommented, Aug 26, 2020

The fix has been tested. Now there should be no issues with running kubectl commands on an HA cluster.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Kubelet can't running after renew certificates #2054 - GitHub
on a running cluster this means the kubelet now has an invalid client certificate for the API server stored in the file kubelet.conf....
Read more >
Troubleshooting kubeadm | Kubernetes
The following error indicates a possible certificate mismatch. # kubectl get pods Unable to connect to the server: x509: certificate signed ...
Read more >
Troubleshoot common Azure Arc-enabled Kubernetes issues
Connecting Kubernetes clusters to Azure Arc · DNS resolution issues · Outbound network connectivity issues · Unable to retrieve MSI certificate.
Read more >
Troubleshoot issues on Kubernetes/OpenShift | Dynatrace Docs
General troubleshooting. Debug logs. By default, OneAgent logs are located in /var/log/dynatrace/oneagent . To debug Dynatrace Operator issues, run. kubectl
Read more >
Chapter 1. Troubleshooting Red Hat Advanced Cluster ...
Symptom: Managed cluster creation fails with certificate IP SAN error. After creating a new Red Hat OpenShift Container Platform cluster on VMware vSphere,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found