Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Proposal: Combine overload for IncrementalValuesProvider<T>

See original GitHub issue

Background and Motivation

I find myself often ending up in a situation where I’d like to combine two IncrementalValuesProvider<T> instances, essentially “zipping” them. There doesn’t seem to be an API for doing this though, as the existing Combine methods only accept one of the left/right values being an IncrementalValueProvider<T> instance. Consider the following simplified scenario:

IncrementalValuesProvider<INamedTypeSymbol> symbols = context.SyntaxProvider.CreateSyntaxProvider(...);

IncrementalValuesProvider<string> left = symbols.Select(static (item, token) => GatherInfoA(item));
IncrementalValuesProvider<string> right = symbols.Select(static (item, token) => GatherInfoB(item));

context.RegisterSourceOutput(left, static (context, item) => { });

// This doesn't compile: no matching overload. I'd like to zip left and right together
// here as I need to access matching items from both when generating code. I don't want
// to have to recompute the information in left again in this right pipeline subtree.
context.RegisterSourceOutput(right.Combine(left), static (context, item) => { });

The rationale here is that:

The intermediate information in left is used on its own in a first source production node
That same information is also needed in the source production node taking right
I would like not to have to call GatherInfoA() again for each item in right, as I already have that info
Additionally calling GatherInfo_() might be expensive, so I really just want to reuse the result I have

Proposed API

namespace Microsoft.CodeAnalysis
{
    public static class IncrementalValueProviderExtensions
    {
+        public static IncrementalValuesProvider<(TLeft Left, TRight Right)> Combine<TLeft, TRight>(this IncrementalValuesProvider<TLeft> provider1, IncrementalValuesProvider<TRight> provider2);
    }
}

Alternative solutions

Consider this scenario:

- SOURCE
   |
   | - Data A ---> Output
   | ---|--- Data B ---> Output
        |      |
        |------|--- Data C ---> Output

One possible workaround doable today is to do something like this:

dataA
.Collect()
.Combine(dataB.Collect())
.SelectMany(static (item, token) =>
    item.Left.Zip(item.Right, static (Left, Right) => (Left, Right)));

Which does yield back an IncrementalValuesProvider<(A, B)> sequence, but this doesn’t seem efficient at all. The fact I’m doing Collect() on both means that every time a single item in the sequences is removed/added/updated, the entire collection will be reevaluated, instead of just that one item. What I’d like instead is to just have individual items that are changed to be queried for reevaluation, with the guarantee that if both source sequences have no incompatible filters on them (that is, either they have no Where calls, or if they do, they have one that applies the same filtering on both sequences), then I’ll just get asked to recompute a single pair of items in this resulting values provider combining the two.

Notes

In order for this to work, Roslyn needs to guarantee that items in the same position across different IncrementalValuesProvider<T> instance will match and refer to the same source item. As in, this will only work if Roslyn can guarantee that transformations on the values providers are “stable”: the two input sources will always have the same number of items when processed (if the user hasn’t messed up filtering) and that items will not be reordered in just one of the two providers. That is, if source item A is used to produce B and C in the transformed producers left and right, then calling Combine on them should guarantee that each resulting pair will correctly associate items B and C for each original source item A used to produce them.

cc. @sharwell and @jkoritzinsky who will involved in this conversation on Discord 🙂

Issue Analytics

State:
Created 2 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

Sergio0694commented, Jan 6, 2022

Wait, I’m confused. While the name would be the same, this doesn’t seem to be the semantics I’m proposing in this issue. I don’t want a cross-join (MxN items), I want a zip of two sequences of equal length. I do agree with @sharwell that maybe a different name (eg. Zip) would be better, if Combine would instead suggest a cross-join behavior like the one you mentioned 🙂

0reactions

chsienkicommented, Dec 15, 2021

We deliberately left out the Combine which performs a cross join when designing the APIs. It is possible to manually perform one with the APIs today however:

IncrementalValuesProvider<INamedTypeSymbol> symbols = context.SyntaxProvider.CreateSyntaxProvider(...);

IncrementalValuesProvider<string> left = symbols.Select(static (item, token) => GatherInfoA(item));
IncrementalValuesProvider<string> right = symbols.Select(static (item, token) => GatherInfoB(item));

var crossJoin = left.Combine(right.Collect()).SelectMany(static (pair, _) => pair.right.Select(static rightItem => (pair.left, rightItem));

This isn’t particularly inefficient. The gather’s are called only as needed, and the collect() will be considered cached if all items in it are. When the right hand side changes, the select many will always be called, but the resulting tuples will essentially be ‘cached out’ in that most of them produced won’t be modified so no downstream nodes will be executed for them. Given that the SelectMany is cheap (comparatively) to run, you shouldn’t see any perf downsides to doing it this way.