question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Prometheus add-on pods cannot not start on arm64 nodes

See original GitHub issue

It appears that the deployments created by the Prometheus operator are not using multi-arch images. After running microk8s enable prometheus I see the following:

$ kubectl get -n monitoring po                                                                                                                                                   [3:15:18]
NAME                                  READY   STATUS             RESTARTS   AGE
grafana-5fd6b9c649-75b4w              1/1     Running            0          11m
kube-state-metrics-dcc94d9f8-bkh5t    0/3     CrashLoopBackOff   15         5m56s
node-exporter-dz865                   2/2     Running            0          25m
node-exporter-f27mk                   1/2     CrashLoopBackOff   9          25m
node-exporter-grrkf                   2/2     Running            0          25m
node-exporter-hws6x                   2/2     Running            0          25m
node-exporter-s2x88                   1/2     CrashLoopBackOff   9          25m
prometheus-adapter-5949969998-8s5jj   1/1     Running            0          25m
prometheus-operator-5c7dcf954-jz6ls   0/1     CrashLoopBackOff   9          25m

When I inspect one of the crashing pods I see

$ kubectl logs -n monitoring pod/kube-state-metrics-dcc94d9f8-bkh5t -c kube-state-metrics                                                                                        
standard_init_linux.go:207: exec user process caused "exec format error"

I am able to fix the crashlooping pods by doing

$ kubectl edit -n monitoring deploy/kube-state-metrics

and adding the following to spec.template.spec:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/arch
          operator: In
          values:
          - amd64

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
balchuacommented, Dec 5, 2020

@H8to thank you very much for taking the time to test it out.

Let’s wait to get this out in time with the 1.20 version.

I will close this issue as its resolved by the PR https://github.com/ubuntu/microk8s/pull/1781

1reaction
balchuacommented, Dec 3, 2020

If you want to try and test this, you can grab the snap from here https://github.com/ubuntu/microk8s/suites/1586391336/artifacts/28778064

The snap produced is an amd64 version.

Steps:

  • disable the existing prometheus if you have it running.
  • Unzip the downloaded snap then install the microk8s.snap that is produced by the build. https://github.com/ubuntu/microk8s/suites/1586391336/artifacts/28778064 Please note that it should be installed on an amd64 architecture.
    sudo snap install microk8s.snap --classic --dangerous
  • Enable prometheus on the node where you installed the microk8s.snap. microk8s enable prometheus

Please note that the snap is using the 1.19 version of microk8s.

I’d be happy if someone can test this one too. Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Prometheus add-on pods cannot not start on arm64 nodes
It appears that the deployments created by the Prometheus operator are not using multi-arch images. After running microk8s enable prometheus ...
Read more >
Prometheus metrics troubleshooting on Amazon EKS and ...
This section provides help for troubleshooting your Prometheus metrics setup on Amazon EKS and Kubernetes clusters.
Read more >
Getting up and running with multi-arch Kubernetes clusters
How to effectively add arm64 nodes into an existing amd64 Kubernetes cluster without making problems for yourself.
Read more >
RHSA-2022:5069 - Security Advisory - Red Hat 고객 포털
BZ - 1968253 - GCP CSI driver can provision volume with access mode ROX ... BZ - 2035899 - Operator-sdk run bundle doesn't...
Read more >
Prepare an Arm workload for deployment - Google Cloud
This page explains how to prepare a workload to be scheduled on Arm nodes in a ... to the workload configuration so that...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found