Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Kafka Health Check readiness is always down until It's consumed the first time

See original GitHub issue

Describe the bug Given an application with the following extensions:

pom.xml:

         <dependency>
            <groupId>io.quarkus</groupId>
            <artifactId>quarkus-smallrye-reactive-messaging-kafka</artifactId>
        </dependency>

        <dependency>
            <groupId>io.quarkus</groupId>
            <artifactId>quarkus-kafka-streams</artifactId>
        </dependency>

That creates a custom topology:

@Produces
    public Topology buildTopology() {
        StreamsBuilder builder = new StreamsBuilder();

        JsonbSerde<LoginAttempt> loginAttemptSerde = new JsonbSerde<>(LoginAttempt.class);
        JsonbSerde<LoginAggregation> loginAggregationSerde = new JsonbSerde<>(LoginAggregation.class);

        builder.stream("from", Consumed.with(Serdes.String(), loginAttemptSerde))
                .groupByKey()
                .windowedBy(TimeWindows.of(Duration.ofSeconds(windowsLoginSec)))
                .aggregate(LoginAggregation::new,
                        (id, value, aggregation) -> aggregation.updateFrom(value),
                        Materialized.<String, LoginAggregation, WindowStore<Bytes, byte[]>> as(LOGIN_AGGREGATION_STORE)
                                .withKeySerde(Serdes.String())
                                .withValueSerde(loginAggregationSerde))
                .toStream()
                .filter((k, v) -> (v.getCode() == UNAUTHORIZED.getStatusCode() || v.getCode() == FORBIDDEN.getStatusCode()))
                .filter((k,v) -> v.getCount() > threshold)
                .to("target");

        return builder.build();
    }

Spite of the application is correctly working, the health check readiness says that the “target” is down:

{
    "status": "DOWN",
    "checks": [
        {
            "name": "SmallRye Reactive Messaging - readiness check",
            "status": "DOWN",
            "data": {
                "login-http-response-values": "[OK]",
                "login-denied": "[KO]"
            }
        },
        {
            "name": "Kafka Streams topics health check",
            "status": "UP",
            "data": {
                "available_topics": "login-http-response-values"
            }
        }
    ]
}

This was working fine prior to 1.12.0.Final.

Expected behavior Health check should be UP.

Actual behavior Health check is DOWN in 1.12.0.Final, 999-SNAPSHOT.

To Reproduce Steps to reproduce the behavior:

git clone https://github.com/Sgitario/quarkus-examples
cd reproducers/kafka-streams-reactive-messaging
mvn clean install -Dquarkus.version=1.12.0.Final It fails because the test checks whether the health check is UP.

If we build the reproducer with mvn clean install -Dquarkus.version=1.11.5.Final, it works:

{
    "status": "UP",
    "checks": [
        {
            "name": "SmallRye Reactive Messaging - readiness check",
            "status": "UP",
            "data": {
                "login-http-response-values": "[OK]",
                "target": "[OK]"
            }
        },
        {
            "name": "Kafka Streams topics health check",
            "status": "UP",
            "data": {
                "available_topics": "login-http-response-values"
            }
        }
    ]
}

Quarkus version or git rev: 1.12.0.Final and 999-SNAPSHOT

Issue Analytics

State:
Created 3 years ago
Comments:9 (8 by maintainers)

Top GitHub Comments

1reaction

cescoffiercommented, Mar 9, 2021

@Sgitario I added a note to the migration guide.

0reactions

Sgitariocommented, Mar 9, 2021

Yes, until we find a better way to implement the check.

Thanks for the update on docs. As this is a breaking change for users using Kafka on OpenShift/K8S, how should we state this change in the migration guide?