question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

System test failure on k8 cluster

See original GitHub issue

Please use this to only for bug reports. For questions or when you need help, you can use the GitHub Discussions, our #strimzi Slack channel or out user mailing list.

Describe the bug We are running the following regression test (RollingUpdateST-testKafkaAndZookeeperScaleUpScaleDown) and noticed the following issue on our k8 cluster (v1.18.5). When setting up the Persistent Storage for the Zookeeper nodes, it would set the storage claim to 1Mi causing the test to fail, as shown below.

NAME                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS    AGE
data-my-cluster-2079133299-zookeeper-0   Bound    pvc-1cae6a93-3ec4-432a-bb50-d5a509f57619   1Mi        RWO            block-storage   18s
data-my-cluster-2079133299-zookeeper-1   Bound    pvc-482b1f8d-f240-474f-a17e-f17ca82482af   1Mi        RWO            block-storage   18s
data-my-cluster-2079133299-zookeeper-2   Bound    pvc-9a8d6b35-1afe-4a14-9df4-82099894d8a7   1Mi        RWO            block-storage   18s
Events:
  Type     Reason                  Age               From                     Message
  ----     ------                  ----              ----                     -------
  Normal   Scheduled               <unknown>         default-scheduler        Successfully assigned namespace-0/my-cluster-2079133299-zookeeper-0 to worker1
  Normal   SuccessfulAttachVolume  82s               attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-1cae6a93-3ec4-432a-bb50-d5a509f57619"
  Warning  FailedMount             3s (x8 over 73s)  kubelet, worker1         MountVolume.MountDevice failed for volume "pvc-1cae6a93-3ec4-432a-bb50-d5a509f57619" : rpc error: code = Internal desc = signal: aborted (core dumped)

However, when running the same test on a local Minikube setup, it would set it to 100Mi, and the test would pass.

NAME                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-my-cluster-1804135696-zookeeper-0   Bound    pvc-eb1ba5df-ebd4-4399-b17c-46a200ca6178   100      RWO            standard       26s
data-my-cluster-1804135696-zookeeper-1   Bound    pvc-e0416ad0-a839-4984-b263-1c14b967a90f   100      RWO            standard       26s
data-my-cluster-1804135696-zookeeper-2   Bound    pvc-d841b6c0-2e56-4249-a25b-c14182d09be5   100      RWO            standard       26s

To resolve this issue, we had to add the required unit into strimzi-kafka-operator/systemtest/src/main/java/io/strimzi/systemtest/templates/crd/KafkaTemplates.java (lines 62 + 68).

   public static KafkaBuilder kafkaPersistent(String name, int kafkaReplicas, int zookeeperReplicas) {
        Kafka kafka = getKafkaFromYaml(Constants.PATH_TO_KAFKA_PERSISTENT_CONFIG);
        return defaultKafka(kafka, name, kafkaReplicas, zookeeperReplicas)
            .editSpec()
                .editKafka()
                    .withNewPersistentClaimStorage()
                        .withSize("100Mi")
                        .withDeleteClaim(true)
                    .endPersistentClaimStorage()
                .endKafka()
                .editZookeeper()
                    .withNewPersistentClaimStorage()
                        .withSize("100Mi")
                        .withDeleteClaim(true)
                    .endPersistentClaimStorage()
                .endZookeeper()
            .endSpec();
    }

To Reproduce Steps to reproduce the behavior:

  1. Run maven verify on the specific regression test in the systemtest directory.

Expected behavior For the RollingUpdateST-testKafkaAndZookeeperScaleUpScaleDown System test to pass on our kubernetes cluster.

Environment (please complete the following information):

  • Strimzi version: 0.26.0
  • Installation method: YAML files
  • Kubernetes cluster: Kubernetes 1.18.5

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
im-kongecommented, Oct 19, 2021

@mcullenEST issue should be fixed in #5748

1reaction
im-kongecommented, Oct 19, 2021

Maybe it’s connected to some recent change. We will fix it ASAP. Thanks for spotting this!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Test Kubernetes cluster failures and experiments in your ...
Test Kubernetes cluster failures and experiments in your terminal. Litmus is an effective tool to cause chaos to test how your system will ......
Read more >
Determine the Reason for Pod Failure - Kubernetes
Determine the Reason for Pod Failure. This page shows how to write and read a Container termination message. Termination messages provide a ...
Read more >
How to Debug Kubernetes Pending Pods and Scheduling ...
Learn how to debug Pending pods that fail to get scheduled due to resource constraints, taints, affinity rules, and other reasons.
Read more >
How to Fix Kubernetes 'Node Not Ready' Error - Komodor
Node Not Ready error indicates a machine in a K8s cluster that cannot run pods. Learn about the causes of this problem and...
Read more >
How to Recover a Broken Kubernetes Cluster - Codefresh
If Kubernetes was not running in HA mode and the only Kubernetes master node has failed, the cluster will be down. If you...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found