microk8s cross node communication not working
See original GitHub issueMy service / pod is only reachable from the node it is executed on.
my setup
I have three fresh and identical Ubuntu 20.04.4 LTS servers, each with its own public IP address.
I installed microk8s on all nodes by running:
sudo snap install microk8s --classic
On the master node I executed
microk8s add-node
and joined the two other servers by executing
microk8s join XXX.XXX.X.XXX:25000/92b2db237428470dc4fcfc4ebbd9dc81/2c0cb3284b05
After that, by running kubectl get no
I can see the three nodes all having the status ready.
And kubectl get all --all-namespaces
shows
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/calico-node-hwsvj 1/1 Running 1 (63m ago) 72m
kube-system pod/calico-node-zd6rc 1/1 Running 1 (62m ago) 71m
kube-system pod/calico-node-djkmk 1/1 Running 1 (62m ago) 72m
kube-system pod/calico-kube-controllers-dc44f6cdf-flj54 1/1 Running 2 (62m ago) 74m
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 75m
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/calico-node 3 3 3 3 3 kubernetes.io/os=linux 75m
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/calico-kube-controllers 1/1 1 1 75m
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/calico-kube-controllers-dc44f6cdf 1 1 1 74m
wget --no-check-certificate https://10.152.183.1/
executed on all nodes returns always
WARNING: cannot verify 10.152.183.1's certificate, issued by ‘CN=10.152.183.1’:
Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 401 Unauthorized
Username/Password Authentication Failed.
So far everything works as expected.
problem 1
I get the IP of calico-kube-controllers by calling kubectl describe -n=kube-system pod/calico-kube-controllers-dc44f6cdf-flj54
And executing wget https://10.1.50.194/
on the “master” node returns
Connecting to 10.1.50.194:443... failed: Connection refused.
and on the two other nodes
Connecting to 10.1.50.194:80... failed: Connection timed out.
For my understanding, the IP of the pod should be reachable from all nodes. Is that correct?
problem 2
I installed the following deployment by calling
kubectl apply -f ./deployment.yaml
kubectl apply -f ./service.yaml
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: test-deployment
name: test-deployment
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: test-deployment
template:
metadata:
labels:
app: test-deployment
spec:
containers:
- image: dontrebootme/microbot:v1
imagePullPolicy: IfNotPresent
name: microbot
resources: {}
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
# service.yaml
apiVersion: v1
kind: Service
metadata:
name: test-service
namespace: default
spec:
type: ClusterIP
selector:
app: test-deployment
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/calico-node-hwsvj 1/1 Running 1 (91m ago) 101m
kube-system pod/calico-node-zd6rc 1/1 Running 1 (91m ago) 100m
kube-system pod/calico-node-djkmk 1/1 Running 1 (91m ago) 101m
kube-system pod/calico-kube-controllers-dc44f6cdf-flj54 1/1 Running 2 (91m ago) 103m
default pod/test-deployment-5899c5ff7d-d442g 1/1 Running 0 59s
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 103m
default service/test-service ClusterIP 10.152.183.247 <none> 80/TCP 31s
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/calico-node 3 3 3 3 3 kubernetes.io/os=linux 103m
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/calico-kube-controllers 1/1 1 1 103m
default deployment.apps/test-deployment 1/1 1 1 59s
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/calico-kube-controllers-dc44f6cdf 1 1 1 103m
default replicaset.apps/test-deployment-5899c5ff7d 1 1 1 59s
Calling wget http://10.152.183.247/
on all nodes returns twice
--2022-05-06 10:34:04-- http://10.152.183.247/
Connecting to 10.152.183.247:80... failed: Connection timed out.
Retrying.
and once
<!DOCTYPE html>
<html>
<style type="text/css">
.centered
{
text-align:center;
margin-top:0px;
margin-bottom:0px;
padding:0px;
}
</style>
<body>
<p class="centered"><img src="microbot.png" alt="microbot"/></p>
<p class="centered">Container hostname: test-deployment-5899c5ff7d-d442g</p>
</body>
</html>
For my understanding, the service of should be reachable from all nodes. Calling wget on the ip of the pod itself shows exactly the same behavior.
workaround
Adding hostNetwork: true
to the deployment makes the service accessible from all nodes, but that seems to be the wrong way of doing it.
Does anyone have an Idea how I can debug this? I am out of Ideas.
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:18
Top GitHub Comments
Meanwhile I also replaced one node by a Debian 11. But still exactly the same behavior.
The route to get to the other node never gets added. Manually adding the route through
ip route
enables temporary communication. @balchua, any chance you could look into this further?This is what the routing table looks like by default: