Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Kafka: Keeps expiring consumers

See original GitHub issue

Bug Report

Current behavior

We have 10 microservices and all interact with each other via kafka. We have noticed it randomly doesnt subscribes to topic, or randomly stops working and it gives kafka error, heartbeat not received while service on its own works fine.

[Nest] 19 - 06/16/2021, 1:09:12 PM [ClientKafka] ERROR [Connection] Response Heartbeat(key: 12, version: 3) {"timestamp":"2021-06-16T13:09:12.779Z","logger":"kafkajs","broker":"kafka-0.kafka-headless.dev.svc.cluster.local:9092","clientId":"reviews-ts-service-client","error":"The group is rebalancing, so a rejoin is needed","correlationId":1241,"size":10} +2857ms
[Nest] 19 - 06/16/2021, 1:09:12 PM [ClientKafka] ERROR [Runner] The group is rebalancing, re-joining {"timestamp":"2021-06-16T13:09:12.779Z","logger":"kafkajs","groupId":"reviews-consumer-ts-customer-client","memberId":"reviews-ts-service-client-453b2860-fdab-4c01-aa98-e015667b8d3b","error":"The group is rebalancing, so a rejoin is needed","retryCount":0,"retryTime":330} +1m

Nest] 21 - 06/16/2021, 6:49:52 PM [ClientKafka] ERROR [Connection] Response Heartbeat(key: 12, version: 3) {"timestamp":"2021-06-16T18:49:52.458Z","logger":"kafkajs","broker":"kafka-0.kafka-headless.dev.svc.cluster.local:9092","clientId":"captain-ps-service-client","error":"The coordinator is not aware of this member","correlationId":54,"size":10} +327904ms
[Nest] 21 - 06/16/2021, 6:49:52 PM [ClientKafka] ERROR [Runner] The coordinator is not aware of this member, re-joining the group {"timestamp":"2021-06-16T18:49:52.460Z","logger":"kafkajs","groupId":"captain-consumer-ps-client","memberId":"captain-ps-service-client-77090749-5dd9-4d17-a12b-aa072579caec","error":"The coordinator is not aware of this member","retryCount":7,"retryTime":30000} +1m

Input Code

import { KafkaOptions, Transport } from "@nestjs/microservices";
import appConfig from "config/appConfig";

export const microServiceConfig: KafkaOptions = {
  transport: Transport.KAFKA,

  options: {
    client: {
      clientId: 'promocode-service',
      brokers: [...`${appConfig().KafkaHost}`.split(",")],
    },
    consumer: {
      groupId: 'promocode-consumer',
      sessionTimeout: 300000,
      retry: { retries: 30 },
    },
    subscribe: {
      fromBeginning: false,
    }
  }
};

Expected behavior

Not clear why kafka keeps timing out randomly if I redeploy all works and then again it stops. Is it wrapper causing issues? These random issues makes me wonder what causes it.

This is running on k8s and this behavior is seen in 1-2 users only, Kafka has enough memory!

All consumers have different group Id and all have high session timeout as well.

Issue Analytics

State:
Created 2 years ago
Reactions:5
Comments:5 (1 by maintainers)

Top GitHub Comments

3reactions

bigtable2006commented, Oct 21, 2021

Hi @jayeshanandani ,

By default, heartBeat is 3 seconds (heartbeatInterval = 3s) and the interval for call heartbeat method will be called every 5 seconds (maxWaitTimeInMs = 5s).

What does it mean? After every 5 seconds, library will call heartbeat method and determine can make a heartbeat request to Kafka Broker or not by the condition:

Call after every "maxWaitTimeInMs"

async heartbeat() { // kafkajs/src/consumer/consumerGroup.js
....
if (memberId && now >= this.lastRequest + heartbeatInterval) {
   // Make a call to Kafka Broker to keep connection.
  await this.coordinator.heartbeat(payload)
  this.lastRequest = Date.now()
  ...
}

For my case, my method is heavy process (process json, parse json and format), it take more than 26s to finish a message. Look like during that time, my service can not send the heartbeat signal to KafkaBroker any my consumer is expired and killed.

HOW TO RESOLVE THIS ISSUE?

    sessionTimeout: 60000,
    heartbeatInterval: 40000,
    maxWaitTimeInMs: 43000,

sessionTimeout : it should be greater than the processing time of method. heartbeatInterval: someone said, it should 2/3 of sessionTimeout maxWaitTimeInMs: it must be **_greater ** with heartbeatInterval

This issue was resolved by above configuration.

Notes: First time, when I config

    sessionTimeout: 60000,
    heartbeatInterval: 40000,
    maxWaitTimeInMs: 30000,

It always show error:

INFO [GroupCoordinator 1]: Preparing to rebalance group local-commission-normalizer-client in state PreparingRebalance with old generation 16 (__consumer_offsets-33) (reason: removing member local-normalizer-client-6f5cecee-d77b-45a6-9f7b-1f0bff49f5ef on heartbeat expiration) (kafka.coordinator.group.GroupCoordinator)

=> The heartbeat will be called too early. It will call after 30s but the condition for sending request to KafkaService is 40s, that why the error happen.

1reaction

jayeshanandanicommented, Jun 21, 2021

@kamilmysliwiec do we need more information here? any input will be of a great help