functions to use in a pipeline
See original GitHub issueIn this ticket, I’d like to discuss 2 new functions and a couple of changes. But before, let me explain the context. Last summer I’ve experimented with data pipelines, using async-iterables instead of streams. Using data pipelines you have often the necessity to fork the iterable and to apply different sub-pipelines to the forked ones. Here’s an example:
...
const [copy1, copy2] = asyncTee(iterable, 2)
const transformed1 = aPipeline(copy1)
const transformed2 = anotherPipeline(copy2)
const merged = asyncMerge(asyncMergeByReadiness, [transformed1, transformed2])
for await (const item of merged) {
// ...
}
I’d like the previous code to be simpler. So here’s a list of proposal:
1 - merge - asyncMerge default function
merge - asyncMerge currently requires a function to decide what item consume first. Adding a sensible default can simplify most common cases. I propose to use a round robin algorithm on merge and asyncMergeByReadiness on asyncMerge.
2 - fork: a better tee
tee api come directly from the Python itertools. It would be very convenient having a curriable version. Even better if it returns a iterable like @KSXGitHub did on the multiPartition, so we don’t need to specify the number of iterables returned. Something like:
const [copy1, copy2] = asyncFork(iterable)
// equivalent to
const [copy1, copy2] = asyncTee(iterable, 2)
3 - apply multiple pipelines to an array of iterables
I could not come up with a good name yet. Let’s call this function pipeMultiple for now.
const [transformed1, transformed2] = pipeMultiple([aPipeline, anotherPipeline], [copy1, copy2])
// equivalent to
const transformed1 = aPipeline(copy1)
const transformed2 = anotherPipeline(copy2)
It should also be curriable.
the result
...
const merged = pipe(
asyncFork(),
pipeMultiple([aPipeline, anotherPipeline]),
asyncMerge()
)
for await (const item of merged) {
// ...
}
Issue Analytics
- State:
- Created 5 years ago
- Comments:18 (3 by maintainers)
Top GitHub Comments
@sithmel I see this is where the suggestion for
fork
came from and (consulting the python docs) the core of its implementation. I suggested on @KSXGitHub’s diff that we just have a single tee function, whose default fork count is Infinity instead of 2. I guess I can sort of see why it might be good not to touch the tee default of 2 – particularly because the name, tee, implies two already.Wait, I’ve got it, we can call the new function
infiniTee
! XDthere are already other tickets covering what is missing here: #158 #169