question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support natural batching in flow

See original GitHub issue

Natural aka. smart batching is a technique in stream processing that optimizes throughput without affecting latency. On the example of a concurrent queue, the consumer has the ability to atomically drain all the items observed at some instant and then process them as a batch. Ideally, the queue should be bounded, giving an upper limit to the batch size and providing backpressure to the sender at the same time.

It’s called “natural” batching because there’s no imposed batch size: when the traffic is low, it will process each item as soon as it arrives. In that case you don’t need any throughput optimizations by batching items together. When the traffic gets higher, the consumer will automatically start processing larger batches, amortizing the fixed latency of a single operation like a database INSERT.

I wrote this sample code that achieves the basic goal:

import kotlinx.coroutines.*
import kotlinx.coroutines.channels.*

const val batchLimit = 20

@ObsoleteCoroutinesApi
suspend inline fun <T: Any> ReceiveChannel<T>.consumeBatched(
        handleItems: (List<T>) -> Unit
) {
    val buf = mutableListOf<T>()
    while (true) {
        receiveOrNull()?.also { buf.add(it) } ?: break
        for (x in 2..batchLimit) {
            poll()?.also { buf.add(it) } ?: break
        }
        handleItems(buf)
        buf.clear()
    }
}

We can test it with this:

@ObsoleteCoroutinesApi
@ExperimentalCoroutinesApi
fun main() {
    val chan = generateMockTraffic()
    runBlocking {
        chan.consumeBatched { println("Received items: $it") }
    }
}

@ExperimentalCoroutinesApi
private fun generateMockTraffic(): ReceiveChannel<Int> {
    return GlobalScope.produce(capacity = batchLimit) {
        (1..100).forEach {
            send(it)
            if (it % 10 == 0) {
                delay(1)
            }
        }
    }
}

consumeBatched() polls the queue one item at a time and therefore must additionally impose a batch limit. It would be more optimal if written against a concurrent queue like the Agrona project’s OneToOneConcurrentArrayQueue, which supports the drain operation.

Could support for natural batching be considered as a feature to add?


Comment by @qwwdfsad, taken from Stack Overflow:

It depends on the desired API surface. drain member is unlikely to be fit for channel semantics: it constraints implementation, it should somehow expose drain limit and it gives channel more “collection-like” API. E.g. how should drain behave with an unlimited channel? Is it possible to implement drain in an efficient manner (with pre-sized buffer, but avoiding OOMs and unlimited collections) once and use it with any channel implementation?

What could be improved is additional hints from the channel such as expected capacity and count of enqueued elements. They can have a relaxed semantics with default implementation and act like hints to drain extension with some reasonable configurable upper bounds.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:19
  • Comments:36 (21 by maintainers)

github_iconTop GitHub Comments

2reactions
stiostcommented, May 5, 2022

Any progress on this? A simple recieveMany(maxCount) would help my case a lot.

2reactions
elizarovcommented, Oct 2, 2019

I think we can manage to implement it as part of #1302. That is, a single chunked operator that supports a duration and size limit parameters should work like a “natural buffering” when duration is set to zero and size limit is set to very big value.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to implement natural (aka. smart) batching with Kotlin ...
It's called "natural" batching because there's no imposed batch size: when the traffic is low, it will process each item as soon as...
Read more >
Lean Lesson: Batching Is Bad - Right? - iSixSigma
Lean teaches us batching is bad. Right? The answer is: It depends. ... After all, single piece-flow is better than batch-and-queue…or is it?...
Read more >
Batch, flow, continuous and custom production processes
A production process is a method a business uses to manufacture products for its customers. The right production process enables businesses ...
Read more >
Ask Art: Why Do I Need to Switch From Batch to Flow?
Moving from batch to flow reveals the waste in your processes and simplifies your work at a systems level, says Art Byrne.
Read more >
Batching - Dynamic Flow Computers
Find a local sales office · Technical Support. Batching ... custody flow computer for Liquid, Industrial Gas, and Natural Gas Applications featuring: three ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found