Proposal: Combine overload for IncrementalValuesProvider<T>
See original GitHub issueBackground and Motivation
I find myself often ending up in a situation where I’d like to combine two IncrementalValuesProvider<T>
instances, essentially “zipping” them. There doesn’t seem to be an API for doing this though, as the existing Combine
methods only accept one of the left/right values being an IncrementalValueProvider<T>
instance. Consider the following simplified scenario:
IncrementalValuesProvider<INamedTypeSymbol> symbols = context.SyntaxProvider.CreateSyntaxProvider(...);
IncrementalValuesProvider<string> left = symbols.Select(static (item, token) => GatherInfoA(item));
IncrementalValuesProvider<string> right = symbols.Select(static (item, token) => GatherInfoB(item));
context.RegisterSourceOutput(left, static (context, item) => { });
// This doesn't compile: no matching overload. I'd like to zip left and right together
// here as I need to access matching items from both when generating code. I don't want
// to have to recompute the information in left again in this right pipeline subtree.
context.RegisterSourceOutput(right.Combine(left), static (context, item) => { });
The rationale here is that:
- The intermediate information in
left
is used on its own in a first source production node - That same information is also needed in the source production node taking
right
- I would like not to have to call
GatherInfoA()
again for each item inright
, as I already have that info - Additionally calling
GatherInfo_()
might be expensive, so I really just want to reuse the result I have
Proposed API
namespace Microsoft.CodeAnalysis
{
public static class IncrementalValueProviderExtensions
{
+ public static IncrementalValuesProvider<(TLeft Left, TRight Right)> Combine<TLeft, TRight>(this IncrementalValuesProvider<TLeft> provider1, IncrementalValuesProvider<TRight> provider2);
}
}
Alternative solutions
Consider this scenario:
- SOURCE
|
| - Data A ---> Output
| ---|--- Data B ---> Output
| |
|------|--- Data C ---> Output
One possible workaround doable today is to do something like this:
dataA
.Collect()
.Combine(dataB.Collect())
.SelectMany(static (item, token) =>
item.Left.Zip(item.Right, static (Left, Right) => (Left, Right)));
Which does yield back an IncrementalValuesProvider<(A, B)>
sequence, but this doesn’t seem efficient at all. The fact I’m doing Collect()
on both means that every time a single item in the sequences is removed/added/updated, the entire collection will be reevaluated, instead of just that one item. What I’d like instead is to just have individual items that are changed to be queried for reevaluation, with the guarantee that if both source sequences have no incompatible filters on them (that is, either they have no Where
calls, or if they do, they have one that applies the same filtering on both sequences), then I’ll just get asked to recompute a single pair of items in this resulting values provider combining the two.
Notes
In order for this to work, Roslyn needs to guarantee that items in the same position across different IncrementalValuesProvider<T>
instance will match and refer to the same source item. As in, this will only work if Roslyn can guarantee that transformations on the values providers are “stable”: the two input sources will always have the same number of items when processed (if the user hasn’t messed up filtering) and that items will not be reordered in just one of the two providers. That is, if source item A
is used to produce B
and C
in the transformed producers left
and right
, then calling Combine
on them should guarantee that each resulting pair will correctly associate items B
and C
for each original source item A
used to produce them.
cc. @sharwell and @jkoritzinsky who will involved in this conversation on Discord 🙂
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (5 by maintainers)
Wait, I’m confused. While the name would be the same, this doesn’t seem to be the semantics I’m proposing in this issue. I don’t want a cross-join (MxN items), I want a zip of two sequences of equal length. I do agree with @sharwell that maybe a different name (eg.
Zip
) would be better, ifCombine
would instead suggest a cross-join behavior like the one you mentioned 🙂We deliberately left out the
Combine
which performs a cross join when designing the APIs. It is possible to manually perform one with the APIs today however:This isn’t particularly inefficient. The gather’s are called only as needed, and the collect() will be considered cached if all items in it are. When the right hand side changes, the select many will always be called, but the resulting tuples will essentially be ‘cached out’ in that most of them produced won’t be modified so no downstream nodes will be executed for them. Given that the SelectMany is cheap (comparatively) to run, you shouldn’t see any perf downsides to doing it this way.