Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Kubectl client outside of HA/multi-master Epiphany cluster fails to connect to server with invalid certificate

See original GitHub issue

Describe the bug On a HA / multi-master, issuing kubectl commands from a machine outside the cluster (e.g. CI agent) will sometime fail with a certificate error. The thought is that the HAProxy on the k8s master machines ends up routing the kubectl in a way that mismatches with the config on the external machine.

To Reproduce Steps to reproduce the behavior:

Build an Epiphany cluster with HA / multi-master (3 masters in this case)
Copy the kube config from one of the k8s master machines to an external machine (as part of this, localhost needs to be replaced in the kube config)
Issue kubectl commands from the external machine, which will fail periodically (depending on how traffic is routed)

Expected behavior It should be possible to issue kubectl commands from the external machine that work consistently.

Config files Key aspects of the config are:

components:
    kubernetes_master:
      count: 3
...
use_ha_control_plane: true

OS (please complete the following information):

OS: Ubuntu

Cloud Environment (please complete the following information):

Cloud Provider: MS Azure

Additional context Add any other context about the problem here.

cc @jsmith085 @sunshine69

Issue Analytics

State:
Created 3 years ago
Comments:8 (3 by maintainers)

Top GitHub Comments

1reaction

przemyslaviccommented, Aug 25, 2020

I did some testing by following the instructions posted here to reproduce the issue. I deployed an HA cluster with public IP addresses on Azure, then logged into one machine (other than the master/node), copied admin.conf from one of the masters, replaced localhost with the private IP address of the master node, and now try to run kubectl. I am getting the same error that is described in this task. The support for public IPs will probably be removed here for security reasons, but I think @atsikham will be able to provide more details about the fix. The result of the kubectl command:

NAME                                            STATUS   ROLES    AGE   VERSION
ci-devhaazurubuflannel-kubernetes-master-vm-0   Ready    master   21h   v1.18.6
ci-devhaazurubuflannel-kubernetes-master-vm-1   Ready    master   22h   v1.18.6
ci-devhaazurubuflannel-kubernetes-master-vm-2   Ready    master   22h   v1.18.6
ci-devhaazurubuflannel-kubernetes-node-vm-0     Ready    <none>   21h   v1.18.6
ci-devhaazurubuflannel-kubernetes-node-vm-1     Ready    <none>   21h   v1.18.6
ci-devhaazurubuflannel-kubernetes-node-vm-2     Ready    <none>   21h   v1.18.6
[operations@ci-devhaazurubuflannel-logging-vm-0 ~]$ kubectl get nodes
Unable to connect to the server: x509: certificate is valid for 10.96.0.1, 10.1.1.9, 51.xx.yy.72, 51.xx.yy.71, 51.xx.yy.68, 127.0.0.1, not 10.1.1.6
[operations@ci-devhaazurubuflannel-logging-vm-0 ~]$ kubectl get nodes
Unable to connect to the server: x509: certificate is valid for 10.96.0.1, 10.1.1.7, 127.0.0.1, 51.xx.yy.72, 51.xx.yy.71, 51.xx.yy.68, not 10.1.1.6
[operations@ci-devhaazurubuflannel-logging-vm-0 ~]$ kubectl get nodes
NAME                                            STATUS   ROLES    AGE   VERSION
ci-devhaazurubuflannel-kubernetes-master-vm-0   Ready    master   21h   v1.18.6
ci-devhaazurubuflannel-kubernetes-master-vm-1   Ready    master   22h   v1.18.6
ci-devhaazurubuflannel-kubernetes-master-vm-2   Ready    master   22h   v1.18.6
ci-devhaazurubuflannel-kubernetes-node-vm-0     Ready    <none>   21h   v1.18.6
ci-devhaazurubuflannel-kubernetes-node-vm-1     Ready    <none>   21h   v1.18.6
ci-devhaazurubuflannel-kubernetes-node-vm-2     Ready    <none>   21h   v1.18.6

0reactions

przemyslaviccommented, Aug 26, 2020

The fix has been tested. Now there should be no issues with running kubectl commands on an HA cluster.

Top Results From Across the Web

Kubelet can't running after renew certificates #2054 - GitHub

on a running cluster this means the kubelet now has an invalid client certificate for the API server stored in the file kubelet.conf....

Troubleshooting kubeadm | Kubernetes

The following error indicates a possible certificate mismatch. # kubectl get pods Unable to connect to the server: x509: certificate signed ...

Troubleshoot common Azure Arc-enabled Kubernetes issues

Connecting Kubernetes clusters to Azure Arc · DNS resolution issues · Outbound network connectivity issues · Unable to retrieve MSI certificate.

Troubleshoot issues on Kubernetes/OpenShift | Dynatrace Docs

General troubleshooting. Debug logs. By default, OneAgent logs are located in /var/log/dynatrace/oneagent . To debug Dynatrace Operator issues, run. kubectl

Chapter 1. Troubleshooting Red Hat Advanced Cluster ...

Symptom: Managed cluster creation fails with certificate IP SAN error. After creating a new Red Hat OpenShift Container Platform cluster on VMware vSphere,...