Creating DAGs with channels - proposing transforms and pipes
See original GitHub issueDirected Graphs
Think directed graphs of channels & coroutines with producers
as message sources and actors
as sinks. What’s missing in this picture is:
- intermediate transform nodes
- graph edges / interconnects or piping
e.g. NodeJS Streams uses similar concepts for high throughput and performance.
The Proposal
I’m proposing adding a couple of coroutines (note that these are early prototypes and not up to par with producer et al):
- transform coroutine - like
produce
&actor
, atransform
is a combination of a coroutine, the state that is confined and is encapsulated into this coroutine, and two channels to communicate with upstream and downstream coroutines. - pipe coroutine - a stateless coroutine that consumes messages from a
ReceiveChannel
and send them to a downstreamSendChannel
. When the downstreamSendChannel
is part of aChannel
, it returns the downstream channel’sReceiveChannel
for further chaining (like a shell pipe sequence$ cmd1 | cmd2 | cmd3 ...
).
Example 1
This example reads blocks (default 1024) as ByteBuffer
s from a file and decodes the blocks to utf8
.
val data : String //the file's contents
FS.createReader(inputFile.toPath())
.pipe(decodeUtf8())
.pipe(contents {
assertEquals(data, it.joinToString { "" })
})
.drainAndJoin()
createReader
returns a ReceiveChannel wrapper aroundaRead
decodeUtf8
receivesByteBuffer
s and returnsString
contents
is a transform which returns a list of all messages after the channel closes
Example 2
Like the previous example, here we read blocks, convert them to utf8 strings and further split text into lines and count the number of lines.
val data : String //the file's contents
val lines = data.split("\n")
val listener = Channel<String>()
val count = async(coroutineContext) {
listener.count()
}
val teeListener = tee(listener, context = coroutineContext)
FS.createReader(inputFile.toPath())
.pipe(decodeUtf8())
.pipe(splitter)
.pipe(teeListener)
.pipe(contents {
assertEquals(lines.size, it.size)
})
.drainAndJoin()
assertEquals(lines.size, count.await())
splitLine
splits incomingString
blocks into individual lines and pushes each line as a message on its downstream channel.tee
is a passthrough transform that replicates messages on the providedReceiveChannel
Current Alternatives
As @jcornaz points out in the discussion below, transforms (with state) can be implemented as extensions of ReceiveChannel
. The snippets (from this test) below contrast the two approaches (extensions are cleaner):
With Transforms/Pipes
dataProducer() // emit chunks of 512 bites
.pipe(tee(verifyLength)) // verify that we're getting all of the data
.pipe(splitter(NL)) // split into lines
.pipe(counter(lineCounter)) // count lines
.pipe(splitter(WS, true)) // split lines into words (words are aligned)
.pipe(counter(wordCounter)) // count words
.drainAndJoin() // wait
With Extensions
dataProducer() // emit chunks of 512 bites
.tee(verifyLength) // verify that we're getting all of the data
.split(NL) // split into lines
.countMessages(lineCounter) // count lines
.split(WS, true) // split lines into words (words are aligned)
.countMessages(wordCounter) // count words
.drain() // wait
Question
Question: is this something the team considers worth pursuing?
Issue Analytics
- State:
- Created 5 years ago
- Comments:12 (10 by maintainers)
Top GitHub Comments
It’s not clear to me how this new API better than standard map/flatMap. Could you show some working, self-contained example? I would like to try to rewrite it using existing channel operators and compare results
Data-Driven Concurrency is cool ! see also: https://dl.acm.org/citation.cfm?doid=3154814.3162014
Building Scalable, Highly Concurrent & Fault Tolerant Systems - Lessons Learned https://www.slideshare.net/jboner/building-scalable-highly-concurrent-fault-tolerant-systems-lessons-learned?from_action=save
Dataflow Concurrency • Deterministic • Declarative • Data-driven • Threads are suspended until data is available • Lazy & On-demand • No difference between: • Concurrent code • Sequential code • Examples: Akka & GPars