question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

microk8s cross node communication not working

See original GitHub issue

My service / pod is only reachable from the node it is executed on.


my setup

I have three fresh and identical Ubuntu 20.04.4 LTS servers, each with its own public IP address.

I installed microk8s on all nodes by running: sudo snap install microk8s --classic

On the master node I executed microk8s add-node and joined the two other servers by executing microk8s join XXX.XXX.X.XXX:25000/92b2db237428470dc4fcfc4ebbd9dc81/2c0cb3284b05

After that, by running kubectl get no I can see the three nodes all having the status ready. And kubectl get all --all-namespaces shows

NAMESPACE     NAME                                          READY   STATUS    RESTARTS      AGE
kube-system   pod/calico-node-hwsvj                         1/1     Running   1 (63m ago)   72m
kube-system   pod/calico-node-zd6rc                         1/1     Running   1 (62m ago)   71m
kube-system   pod/calico-node-djkmk                         1/1     Running   1 (62m ago)   72m
kube-system   pod/calico-kube-controllers-dc44f6cdf-flj54   1/1     Running   2 (62m ago)   74m

NAMESPACE   NAME                 TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
default     service/kubernetes   ClusterIP   10.152.183.1   <none>        443/TCP   75m

NAMESPACE     NAME                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
kube-system   daemonset.apps/calico-node   3         3         3       3            3           kubernetes.io/os=linux   75m

NAMESPACE     NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/calico-kube-controllers   1/1     1            1           75m

NAMESPACE     NAME                                                DESIRED   CURRENT   READY   AGE
kube-system   replicaset.apps/calico-kube-controllers-dc44f6cdf   1         1         1       74m

wget --no-check-certificate https://10.152.183.1/ executed on all nodes returns always

WARNING: cannot verify 10.152.183.1's certificate, issued by ‘CN=10.152.183.1’:
  Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 401 Unauthorized

Username/Password Authentication Failed.

So far everything works as expected.


problem 1

I get the IP of calico-kube-controllers by calling kubectl describe -n=kube-system pod/calico-kube-controllers-dc44f6cdf-flj54

And executing wget https://10.1.50.194/ on the “master” node returns

Connecting to 10.1.50.194:443... failed: Connection refused.

and on the two other nodes

Connecting to 10.1.50.194:80... failed: Connection timed out.

For my understanding, the IP of the pod should be reachable from all nodes. Is that correct?


problem 2

I installed the following deployment by calling

kubectl apply -f ./deployment.yaml
kubectl apply -f ./service.yaml
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: test-deployment
  name: test-deployment
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test-deployment
  template:
    metadata:
      labels:
        app: test-deployment
    spec:
      containers:
      - image: dontrebootme/microbot:v1
        imagePullPolicy: IfNotPresent
        name: microbot
        resources: {}
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
# service.yaml
apiVersion: v1 
kind: Service 
metadata:
  name: test-service 
  namespace: default
spec:
  type: ClusterIP
  selector:
    app: test-deployment
  ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: 80

kubectl get all --all-namespaces

NAMESPACE     NAME                                          READY   STATUS    RESTARTS      AGE
kube-system   pod/calico-node-hwsvj                         1/1     Running   1 (91m ago)   101m
kube-system   pod/calico-node-zd6rc                         1/1     Running   1 (91m ago)   100m
kube-system   pod/calico-node-djkmk                         1/1     Running   1 (91m ago)   101m
kube-system   pod/calico-kube-controllers-dc44f6cdf-flj54   1/1     Running   2 (91m ago)   103m
default       pod/test-deployment-5899c5ff7d-d442g          1/1     Running   0             59s

NAMESPACE   NAME                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
default     service/kubernetes     ClusterIP   10.152.183.1     <none>        443/TCP   103m
default     service/test-service   ClusterIP   10.152.183.247   <none>        80/TCP    31s

NAMESPACE     NAME                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
kube-system   daemonset.apps/calico-node   3         3         3       3            3           kubernetes.io/os=linux   103m

NAMESPACE     NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/calico-kube-controllers   1/1     1            1           103m
default       deployment.apps/test-deployment           1/1     1            1           59s

NAMESPACE     NAME                                                DESIRED   CURRENT   READY   AGE
kube-system   replicaset.apps/calico-kube-controllers-dc44f6cdf   1         1         1       103m
default       replicaset.apps/test-deployment-5899c5ff7d          1         1         1       59s

Calling wget http://10.152.183.247/ on all nodes returns twice

--2022-05-06 10:34:04--  http://10.152.183.247/
Connecting to 10.152.183.247:80... failed: Connection timed out.
Retrying.

and once

<!DOCTYPE html>
<html>
  <style type="text/css">
    .centered
      {
      text-align:center;
      margin-top:0px;
      margin-bottom:0px;
      padding:0px;
      }
  </style>
  <body>
    <p class="centered"><img src="microbot.png" alt="microbot"/></p>
    <p class="centered">Container hostname: test-deployment-5899c5ff7d-d442g</p>
  </body>
</html>

For my understanding, the service of should be reachable from all nodes. Calling wget on the ip of the pod itself shows exactly the same behavior.


workaround

Adding hostNetwork: true to the deployment makes the service accessible from all nodes, but that seems to be the wrong way of doing it.


Does anyone have an Idea how I can debug this? I am out of Ideas.

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:1
  • Comments:18

github_iconTop GitHub Comments

1reaction
RobinJespersencommented, May 11, 2022

Meanwhile I also replaced one node by a Debian 11. But still exactly the same behavior.

0reactions
IDevJoecommented, May 25, 2022

The route to get to the other node never gets added. Manually adding the route through ip route enables temporary communication. @balchua, any chance you could look into this further?

This is what the routing table looks like by default:

ubuntu@k81:~$ ip route
default via 10.0.0.1 dev eth0 proto static
10.0.0.0/27 dev eth0 proto kernel scope link src 10.0.0.6
blackhole 10.1.10.192/26 proto 80
10.1.10.193 dev califb3eb82ef50 scope link
Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting - MicroK8s
You may experience the API server being slow, crashing or forming an unstable multi node cluster. Such problems are often traced to low...
Read more >
How to Fix Kubernetes 'Node Not Ready' Error - Komodor
Node Not Ready error indicates a machine in a K8s cluster that cannot run pods. Learn about the causes of this problem and...
Read more >
How to Set Up a Local Kubernetes Instance With MicroK8s on ...
Step 1: Installing MicroK8s on Ubuntu​​ You can easily install MicroK8s on Ubuntu using the snap command. Alternatively, you can install MicroK8s ......
Read more >
What Is Kubernetes Networking? - Sysdig
... hoping that the two ports never cross is not the solution to this problem – you'll ... All Kubernetes nodes can communicate...
Read more >
Learn how to install MicroK8s for Kubernetes - TechTarget
To stand up additional nodes, create two additional VMs. To create the MicroK8s cluster, join the VMs together such that they can communicate...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found