question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inaccessible pods on other nodes for high availability cluster

See original GitHub issue

I made a 3 node cluster on ec2, and wanted to just launch a generic application to make sure everything is accessible. I created a microk8s environment on each machine, and got them to add to a HA cluster. When I tried to launch the microbot deployment on ubuntu tutorial, each machine could only access its own pod.

When running microk8s kubectl get pods -o wide, I get the following:

NAME                        READY   STATUS    RESTARTS   AGE    IP            NODE               NOMINATED NODE   READINESS GATES
microbot-5f5499d479-ngz56   1/1     Running   0          179m   10.1.94.76    ip-172-31-18-128   <none>           <none>
microbot-5f5499d479-nkctv   1/1     Running   1          175m   10.1.162.72   ip-172-31-21-37    <none>           <none>
microbot-5f5499d479-zkjcn   1/1     Running   1          175m   10.1.162.73   ip-172-31-21-37    <none>           <none>

This is with the deployment scaled to 3.

If I curl on the machine ending in 128, I get a 1/3 chance in hitting itself, and 2/3 with the one ending in 37. On my other machine, it always hangs because it is attempting to access one of the other two machines and does not seem to be able to.

ufw is disabled, and I’ve tried running

sudo iptables -P FORWARD ACCEPT
sudo apt-get install iptables-persistent

on each machine, to no avail. I can ping them on other services fine. I enabled ingress w/o a service, and each one shoots me a 404 error, so it can clearly route.

I’ve attached the inspection logs. inspection-report-20210112_132731.tar.gz

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
rockautcommented, Jan 14, 2021

I also had problems on my Ubuntu Cluster, for me it turned out to be problems with net_bridge. So I had to enable the modules and sysctl.

Added to /etc/modules-load.d/modules.conf:

overlay
br_netfilter
bridge

and to /etc/sysctl.conf:

net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1

Rebooted and all worked.

I previously had docker already installed and uninstalled and also played around with cni and podman - so it might got crushed somewhere with those fiddlings.

0reactions
stale[bot]commented, Dec 17, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Inaccessible pods on other nodes for high availability cluster
I made a 3 node cluster on ec2, and wanted to just launch a generic application to make sure everything is accessible.
Read more >
A Guide to High Availability/Disaster Recovery for Applications ...
If the control plane fails, or otherwise becomes inaccessible, many functions will stop working. Scheduling of Pods is one of the most critical....
Read more >
Kubernetes Tip: What Happens To Pods Running On Node ...
The value entirely depends upon business requirements such as application SLA's, Cluster Resource Utilization, etc. If an environment has tight ...
Read more >
High Availability | OpenShift Container Platform 3.11
This topic describes setting up high availability for pods and services on your OpenShift Container Platform cluster. IP failover manages a pool of...
Read more >
What Should I Do If a Cluster Is Available But Some Nodes ...
Log in to the CCE console and click the cluster. In the navigation pane, choose Nodes. Click Monitor in the row of the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found