Support SQL translation for .Net 6 Linq's MinBy/MaxBy Methods
See original GitHub issueI’m using .Net 6 and the new MinBy/MaxBy methods are great. However, there’s no default translation to SQL for them - despite this being seemingly a very natural thing to be able to do.
I can do this: db.SomeDbSet.ToList().MinBy( x => x.SomeField )
but obviously the problem with that is that it’s going to have to load the entire set into memory, and won’t use any of the DB indexes etc, so will be slow.
Would be great if this could be added before EFCore 6 is released (or as an early preview for EFCore 7).
I’m using the SQLite provider, if it makes a difference.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:12
- Comments:21 (12 by maintainers)
Top Results From Across the Web
sql server - .NET 6 New LINQ Features with Entity Framework
LINQ doesn't run by itself, it gets translated to SQL. All queries that would perform the equivalent of MaxBy have a high cost....
Read more >Bite-Size .NET 6 - MaxBy() and MinBy() in LINQ
In .NET 6, we can use the new MaxBy() and MinBy() methods to find an object with the maximum value in a set....
Read more >Supported and Unsupported LINQ Methods (LINQ to Entities)
This article summarizes the standard query operators that are supported and unsupported in LINQ to Entities queries.
Read more >A look at the upcoming improvements to LINQ in .NET 6
NET developers the MinBy and MaxBy extension methods in LINQ. These two methods allow you to look at your collection and find the...
Read more >SuperLinq 4.1.0
This project enhances LINQ to Objects with the following methods: AggregateRight, AtLeast, AtMost, Cartesian, Choose, CountBetween, ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Implementation Proposal
Ideally, we will translate the
MinBy
/MaxBy
into “lower”, already supported expressions. These get translated to SQL correctly even now. I’m actually using precisely these expression in production with SQL Server. With the translations involving only expression juggling, no provider-specific work will be required.Scenario 1: Ungrouped MaxBy
Translate
MaxBy(Selector)
intoOrderByDecending(Selector).Take(1)
. Note thatSelector
could be a tuple, requiring a chain ofThenByDescending()
.Simplest form
Using index
CreationDateTime, [Id]
.With complexities
The actual translation still only has to deal with the tuple in
MaxBy
. The rest is just distractions.Using index
IsDeleted, CreationDateTime, [Id]
.Scenario 2: Grouped MaxBy
This one is better understood from the code, but here is the theory: Recognize a
MaxBy
inside aSelect
, where the latter is taking anIGrouping<TKey, TElement>
as its source. TranslateMaxBy(Selector)
intoWhere(MatchesGroupKey).OrderByDecending(GroupKey).ThenByDescending(Selector).First().Id
. Additionally, directly after theSelect
, perform aJoin
to obtain the winning rows from their IDs. This has the added benefit of isolating the intricacies of this part of the query, making followup user syntax likeSelect
,Join
, orOrderBy
work, and without breaking the intended plan.Simplest form
Using index
CustomerId, CreationDateTime, [Id]
.Having the translation always emit a left-complete ordering (including the
CustomerId
that was made constant by theWhere
) helps to (A) clarify the intended index and (B) work around a MySQL optimizer bug that, in subqueries, won’t recognize the appropriate index without it.Notably, in
Select(group => [...].MaxBy(x => [...]))
,MaxBy
must be the final expression inside theSelect
. Attempting something like.Select(group => group.MaxBy(x => x.CreationDateTime).SomeOtherProperty)
should result in an untranslatable query. It would prevent us from taking the ID and appending theJoin
. This constraint should be acceptable: the only use case I can think of is selecting a single property instead of the entire entity. That can still be achieved by addingSelect(x => x.SomeOtherProperty
at the end of the query.With complexities
To maximize complexity, we’ll use a composite group key (
CustomerId, ShopId
) and a composite selector (IsDeleted, CreationDateTime
). We’ll also add some conditions, such as one that cuts of a time window using the index, and another that scans over a few mismatching items.Using index
CustomerId, ShopId, IsDeleted, CreationDateTime, [Id]
.You’re right. Based on further testing, I can now say that my earlier claims regarding subqueries having a greater constant overhead certainly do not apply to SQL Server. It works as you might expect: it interprets the query to understand what you want, and produces a plan. Whether you expressed the query as a join or a dependent subquery makes little difference to it.
I was originally trained on MySQL 5.6, so it is very possible that what I claimed applies to that alone. I have not yet checked the behavior on MySQL 8, although I’m quite curious.