Best consume method for handling back-pressure
See original GitHub issueHi. I’ve been using node-rdkafka
for a little while and overall I am very pleased with how it works, great lib! I do have a question however that I am not 100% on the answer to. I have a consumer process that needs to be able to handle back-pressure effectively as I am doing ETL into a slower database cluster, so I don’t want the consumer to be overwhelmed and go OOM. However, I am getting OOM issues periodically… I was looking through the library code on the JS side and saw this comment: https://github.com/Blizzard/node-rdkafka/blob/master/lib/kafka-consumer.js#L382
This suggests that using the consume()
method is going to result in OOMs in situations like mine. I have set librdkafka
settings such as queued.min.messages
so it only grabs a smaller number of messages in the background. I’ve also built an internal queue in my process with a pause/resume mechanism to allow for more control. For the most part previous OOMs have much reduced, however as mentioned I still do get them. Below is a skeleton version of what I have running, comments on any issues or reasons for the OOM would be greatly appreciated.
let isPaused = false;
let consumerRunCheck: ReturnType<typeof setTimeout>;
try {
// Kafka consumer tracking
let processingQueue: Array<Kafka.Message> = []; // Internal message queue
const queueStatus = new EventEmitter; // Init internal batch queue emitter
let lastBatchTime: number = Date.now();
const onRebalance = async function onRebalance(err: Kafka.LibrdKafkaError, assignments: Array<Kafka.Assignment>): Promise<void> {
try {
if (err.code === Kafka.CODES.ERRORS.ERR__ASSIGN_PARTITIONS) {
await consumer.assign(assignments);
} else if (err.code === Kafka.CODES.ERRORS.ERR__REVOKE_PARTITIONS) {
processingQueue = [];
await consumer.unassign();
} else {
console.error(`Re-balance error: ${err.message}` });
}
} catch (err) {
handleErrors(err);
}
}
// Create the Kafka consumer
const consumer: Kafka.KafkaConsumer = new Kafka.KafkaConsumer({
...consumerConfig,
'rebalance_cb': onRebalance
}, topicConfig);
// Connect to the Kafka broker(s)
consumer.connect();
consumer.on('ready', () => {
consumer.subscribe(transformConfig.topics);
consumer.setDefaultConsumeTimeout(transformConfig.defaultConsumeTimeout);
consumer.consume();
});
// Data event detected, push to the internal processing queue
consumer.on('data', (data: Kafka.Message) => {
processingQueue.push(data);
// Check to see if internal queue is full and trigger batch processing
if (isPaused === false && processingQueue.length >= transformConfig.internalQueueMax) {
consumer.pause(consumer.assignments());
isPaused = true;
queueStatus.emit('batchReady');
}
});
// Batch read for processing event detected
queueStatus.on('batchReady', () => {
// Process the batch of messages
return messageProcessing(processingQueue).catch((err) => {
handleErrors(err);
}).finally(() => {
// Free the internal queue memory for the next batch
processingQueue.length = 0;
// Set the last batch time
lastBatchTime = Date.now();
// Commit offsets
consumer.commit();
// Resume consuming messages
consumer.resume(consumer.assignments());
isPaused = false;
});
});
// Check to see if we need to restart things
consumerRunCheck = setInterval(() => {
console.log(`Internal processing queue has ${processingQueue.length} messages waiting (isPaused: ${isPaused}).`);
// Calculate the last time we saw a batch
const lastBatchDiff: number = Math.floor((Date.now() - lastBatchTime)/1000);
if (isPaused && lastBatchDiff >= 60) {
console.log(`Consumer appears to be stuck, unpausing.`);
// Resume consuming messages
consumer.resume(consumer.assignments());
isPaused = false;
}
}, 60000);
} catch (err) {
console.error(err.message);
process.exit(1);
}
The consumer settings are as follows:
export const consumerConfig: ConsumerGlobalConfig = {
'client.id': `my-client`,
'group.instance.id': `my-client-instance-${Date.now()}`,
'metadata.broker.list': brokers,
'group.id': `my-events`,
'session.timeout.ms': 30000,
'heartbeat.interval.ms': 3000,
'enable.auto.commit': false,
'queued.min.messages': 10,
'queued.max.messages.kbytes': 65536,
'fetch.message.max.bytes': 1048576,
'fetch.max.bytes': 1048576
};
I am considering switching from the style listed above to making use of the callback on the consume()
method but before that it would be interesting to see if there is something obvious I am doing (or not) with the above code.
Issue Analytics
- State:
- Created 3 years ago
- Comments:8
Top GitHub Comments
@sathyarajagopal The consumer’s stream API extends the native class of Readable, but in the past, there was a problem of not stopping reading messages when the internal buffer has reached the threshold of
highWaterMark
. (I don’t know if this has been fixed)Hi,
In our company wa are using the consumer(non-following mode) and work fine to handle back-pressure (with the async module or with out it).