question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

functions to use in a pipeline

See original GitHub issue

In this ticket, I’d like to discuss 2 new functions and a couple of changes. But before, let me explain the context. Last summer I’ve experimented with data pipelines, using async-iterables instead of streams. Using data pipelines you have often the necessity to fork the iterable and to apply different sub-pipelines to the forked ones. Here’s an example:

...
const [copy1, copy2] = asyncTee(iterable, 2)
const transformed1 = aPipeline(copy1)
const transformed2 = anotherPipeline(copy2)
const merged = asyncMerge(asyncMergeByReadiness, [transformed1, transformed2])
for await (const item of merged) {
  // ...
}

I’d like the previous code to be simpler. So here’s a list of proposal:

1 - merge - asyncMerge default function

merge - asyncMerge currently requires a function to decide what item consume first. Adding a sensible default can simplify most common cases. I propose to use a round robin algorithm on merge and asyncMergeByReadiness on asyncMerge.

2 - fork: a better tee

tee api come directly from the Python itertools. It would be very convenient having a curriable version. Even better if it returns a iterable like @KSXGitHub did on the multiPartition, so we don’t need to specify the number of iterables returned. Something like:

const [copy1, copy2] = asyncFork(iterable)
// equivalent to
const [copy1, copy2] = asyncTee(iterable, 2)

3 - apply multiple pipelines to an array of iterables

I could not come up with a good name yet. Let’s call this function pipeMultiple for now.

const [transformed1, transformed2] = pipeMultiple([aPipeline, anotherPipeline], [copy1, copy2])
// equivalent to
const transformed1 = aPipeline(copy1)
const transformed2 = anotherPipeline(copy2)

It should also be curriable.

the result

...
const merged = pipe(
  asyncFork(),
  pipeMultiple([aPipeline, anotherPipeline]),
  asyncMerge()
)

for await (const item of merged) {
  // ...
}

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:18 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
conartist6commented, Jan 23, 2019

@sithmel I see this is where the suggestion for fork came from and (consulting the python docs) the core of its implementation. I suggested on @KSXGitHub’s diff that we just have a single tee function, whose default fork count is Infinity instead of 2. I guess I can sort of see why it might be good not to touch the tee default of 2 – particularly because the name, tee, implies two already.

Wait, I’ve got it, we can call the new function infiniTee! XD

0reactions
sithmelcommented, Feb 24, 2019

there are already other tickets covering what is missing here: #158 #169

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pipes and functions in R
We can think of pipes as taking the output of one function and feeding it as the first argument to another function call,...
Read more >
Pipelined Table Functions - Oracle Base
Pipelined table functions include the PIPELINED clause and use the PIPE ROW call to push rows out of the function as soon as...
Read more >
18 Pipes | R for Data Science
18.1 Introduction Pipes are a powerful tool for clearly expressing a sequence ... And we'll use a function for each key verb: hop()...
Read more >
The function pipeline - IBM
The function pipeline · Looks for new device data · Identifies that calculations are needed · Builds a pipeline that orders functions into...
Read more >
Timescale Documentation | Function pipelines
Function pipelines are an experimental feature, designed to radically improve how you write queries to analyze data in PostgreSQL and SQL. They work...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found