Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] Pod Monitor doesn't work with Prometheus ( rancher-charts ) in different namespace.

See original GitHub issue

Hello,

I’m facing an issue with the new monitoring approach implemented in version 0.20.0. I can successfully get the metrics from every container using _curl http://localhost:9404/metrics_, but when i deploy the PodMonitor in my prometheus namespace i can not see metrics or even discovered targets . monitoring.coreos.com PodMonitor definition is not clear enough to me and i don’t understand how the prometheus is able to scrape metrics while the strimzi operator is not exposing port 9404 anymore.

From there i have one more question. What if i want to use external prometheus server deployed out of the cluster. For example on virtual machine. Is there a way to scrape metrics from the LoadBalancer listener ?

Strimzi:

Operator: 0.20
Kafka version: 2.5.0
Namespace: kafka

Prometheus:

Prometheus is able to discovery new services with ServiceMonitor definition from other projects and namespaces.
Deployed with rancher using rancher-charts
Namespace: cattle-prometheus

The files used to deploy Kafka and PodMonitor:

kafka.yaml

apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
  name: kafka-cluster
  namespace: kafka
spec:
  kafka:
    # I just use private registry. There are no changes made to the docker file.
    image: registry1.mydomain.com/kafka/kafka:2.5.0
    version: 2.5.0
    replicas: 5
    logging:
      type: inline
      loggers:
        kafka.root.logger.level: "INFO"
    resources:
      requests:
        memory: 16Gi
        cpu: "8"
      limits:
        memory: 24Gi
        cpu: "12"  
    jvmOptions:
      -Xms: "6g"
      -Xmx: "6g"
      -XX:
        UseG1GC: true
        InitiatingHeapOccupancyPercent: 35
        MinMetaspaceFreeRatio: 50
        MaxMetaspaceFreeRatio: 80
        MaxGCPauseMillis: 20
        MetaspaceSize: "96m"
        G1HeapRegionSize: "16M"
    listeners:
      external:
        type: loadbalancer
        tls: false
          bootstrap:
            address: kafka1.mydomain.com 
          brokers:
          - broker: 0
            advertisedHost: kafka1-broker0.mydomain.com
          - broker: 1
            advertisedHost: kafka1-broker1.mydomain.com
          - broker: 2
            advertisedHost: kafka1-broker2.mydomain.com
          - broker: 3
            advertisedHost: kafka1-broker3.mydomain.com
          - broker: 4
            advertisedHost: kafka1-broker4.mydomain.com
      plain: {}
    config:
      auto.create.topics.enable: "true"
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      default.replication.factor: 3
      log.retention.hours: 168
      num.network.threads: 3
      num.io.threads: 8
      transaction.state.log.min.isr: 2
      log.message.format.version: "2.5"
    storage:
      type: jbod
      volumes:
      - id: 0
        type: persistent-claim
        size: 100Gi
        deleteClaim: false
    metrics: {}
  zookeeper:
    image: registry1.mydomain.com/kafka/kafka:2.5.0
    replicas: 5
    resources:
      requests:
        memory: 4Gi
        cpu: "2"
      limits:
        memory: 8Gi
        cpu: "4"  
    jvmOptions:
      -Xms: "4g"
      -Xmx: "4g"
    storage:
      type: persistent-claim
      size: 20Gi
      deleteClaim: false
    metrics: {}
    jmxOptions: {}
  entityOperator:
    topicOperator: {}
    userOperator: {}
  kafkaExporter:
    image: registry1.mydomain.com/kafka/kafka:2.5.0
    groupRegex: ".*"
    topicRegex: ".*"
    logging: debug
    enableSaramaLogging: true
    readinessProbe:
      initialDelaySeconds: 15
      timeoutSeconds: 5
    livenessProbe:
      initialDelaySeconds: 15
      timeoutSeconds: 5
  # cruiseControl:
  #   image: registry1.mydomain.com/kafka/kafka:2.5.0
  #   config:
  #     security.protocol: PLAINTEXT

Pod Monitor:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: cluster-operator-metrics
  namespace: cattle-prometheus
spec:
  selector:
    matchLabels:
      strimzi.io/kind: cluster-operator
    namespaceSelector:
      - kafka
  podMetricsEndpoints:
  - path: /metrics
    port: web
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: entity-operator-metric
  namespace: cattle-prometheus
  labels:
    app: prometheus
    release: cluster-monitoring
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: entity-operator
    namespaceSelector:
      - kafka
  podMetricsEndpoints:
  - path: /metrics
    port: healthcheck
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: bridge-metrics
  namespace: cattle-prometheus
  labels:
    app: prometheus
    release: cluster-monitoring
spec:
  selector:
    matchLabels:
      strimzi.io/kind: KafkaBridge
    namespaceSelector:
      - kafka
  podMetricsEndpoints:
  - path: /metrics
    port: rest-api
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: kafka-resources-metrics
  namespace: cattle-prometheus
  labels:
    app: prometheus
    release: cluster-monitoring
spec:
  selector:
    matchExpressions:
      - key: "strimzi.io/kind"
        operator: In
        values: ["Kafka", "KafkaConnect", "KafkaConnectS2I", "KafkaMirrorMaker", "KafkaMirrorMaker2"]
    namespaceSelector:
      - kafka
  podMetricsEndpoints:
  - path: /metrics
    port: tcp-prometheus
    relabelings:
    - separator: ;
      regex: __meta_kubernetes_pod_label_(strimzi_io_.+)
      replacement: $1
      action: labelmap
    - sourceLabels: [__meta_kubernetes_namespace]
      separator: ;
      regex: (.*)
      targetLabel: namespace
      replacement: $1
      action: replace
    - sourceLabels: [__meta_kubernetes_pod_name]
      separator: ;
      regex: (.*)
      targetLabel: kubernetes_pod_name
      replacement: $1
      action: replace
    - sourceLabels: [__meta_kubernetes_pod_node_name]
      separator: ;
      regex: (.*)
      targetLabel: node_name
      replacement: $1
      action: replace
    - sourceLabels: [__meta_kubernetes_pod_host_ip]
      separator: ;
      regex: (.*)
      targetLabel: node_ip
      replacement: $1
      action: replace

Greetings,

Thank you for this awesome project !

Issue Analytics

State:
Created 3 years ago
Comments:9 (4 by maintainers)

Top GitHub Comments

2reactions

jrivers96commented, Dec 4, 2020

Hello, I think I might have had the same problem and it drove me a bit nuts for a day.

There is a closed issue about only having the pod monitor and not a service monitor: https://github.com/prometheus-operator/prometheus-operator/issues/3164

I had to add the ServiceMonitorSelector: {} so that my prometheus would utilize the pod monitor. I’m on prometheus 0.33.0.

---
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
  labels:
    app: strimzi
spec:
  replicas: 1
  serviceAccountName: prometheus-server
  serviceMonitorSelector: {}
  podMonitorSelector:
    matchLabels:
      app: strimzi

1reaction

scholzjcommented, Feb 17, 2021

Thanks for sharing the solution. I guess this can be closed now?

Top Results From Across the Web

[Question] Pod Monitor doesn't work with Prometheus ( rancher ...

[Question] Pod Monitor doesn't work with Prometheus ( rancher-charts ) in different namespace.

Rancher 25 Prometheus for cluster/project owners

Rancher allows any users who are authenticated by Kubernetes and have access the Grafana service deployed by the Rancher Monitoring chart to ...

Kubernetes Monitoring with Prometheus, Grafana and Rancher

Learn about Kubernetes monitoring with Prometheus, Grafana, and Rancher. ... This makes it easy to point the service to a different set of...

Troubleshooting the Rancher Server Kubernetes Cluster

Use kubectl to check the cattle-system system namespace and see if the Rancher pods are in a Running state. kubectl -n cattle-system get...

GitLab Runner Helm Chart

Run using the Kubernetes executor for GitLab Runner. For each new job it receives from GitLab CI/CD, provision a new pod within the...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

[Question] Pod Monitor doesn't work with Prometheus ( rancher-charts ) in different namespace.

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

[Bug] Operator continuously update resources in AKS

[Question] ... No CA found. error while creating kafka cluster with ingress and own CA certificate