Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support SQL translation for .Net 6 Linq's MinBy/MaxBy Methods

See original GitHub issue

I’m using .Net 6 and the new MinBy/MaxBy methods are great. However, there’s no default translation to SQL for them - despite this being seemingly a very natural thing to be able to do.

I can do this: db.SomeDbSet.ToList().MinBy( x => x.SomeField ) but obviously the problem with that is that it’s going to have to load the entire set into memory, and won’t use any of the DB indexes etc, so will be slow.

Would be great if this could be added before EFCore 6 is released (or as an early preview for EFCore 7).

I’m using the SQLite provider, if it makes a difference.

Issue Analytics

State:
Created 2 years ago
Reactions:12
Comments:21 (12 by maintainers)

Top GitHub Comments

2reactions

Timovzlcommented, Aug 15, 2023

Implementation Proposal

Ideally, we will translate the MinBy/MaxBy into “lower”, already supported expressions. These get translated to SQL correctly even now. I’m actually using precisely these expression in production with SQL Server. With the translations involving only expression juggling, no provider-specific work will be required.

Scenario 1: Ungrouped MaxBy

Translate MaxBy(Selector) into OrderByDecending(Selector).Take(1). Note that Selector could be a tuple, requiring a chain of ThenByDescending().

Simplest form

Using index CreationDateTime, [Id].

// Usage
this.DbContext.Orders
	.MaxBy(x => x.CreationDateTime);

// Proposed translation to lower expressions
this.DbContext.Orders
	.OrderByDescending(x => x.CreationDateTime)
	.Take(1);

-- Approximation of expected SQL
SELECT TOP(1) o.*
FROM Orders o
ORDER BY o.CreationDateTime DESC

With complexities

The actual translation still only has to deal with the tuple in MaxBy. The rest is just distractions.

Using index IsDeleted, CreationDateTime, [Id].

// Usage
this.DbContext.Orders
	.Where(x => !x.IsDeleted)
	.Where(x => x.CreationDateTime < new DateTime(2023, 01, 01)
	.MaxBy(x => new { x.IsDeleted, x.CreationDateTime, x.Id }) // Tip for user: Include x.IsDeleted to match index explicitly
	.Select(x => x.Id);

// Proposed translation to lower expressions
this.DbContext.Orders
	.Where(x => !x.IsDeleted)
	.Where(x => x.CreationDateTime < new DateTime(2023, 01, 01)
	.OrderByDescending(x => x.IsDeleted)
	.ThenByDescending(x => x.CreationDateTime)
	.ThenByDescending(x => x.Id)
	.Take(1)
	.Select(x => x.Id);

-- Approximation of expected SQL
SELECT TOP(1) o.Id
FROM Orders o
WHERE o.IsDeleted = 0
AND o.CreationDateTime < '2023-01-01'
ORDER BY o.IsDeleted DESC, o.CreationDateTime DESC, o.Id DESC

Scenario 2: Grouped MaxBy

This one is better understood from the code, but here is the theory: Recognize a MaxBy inside a Select, where the latter is taking an IGrouping<TKey, TElement> as its source. Translate MaxBy(Selector) into Where(MatchesGroupKey).OrderByDecending(GroupKey).ThenByDescending(Selector).First().Id. Additionally, directly after the Select, perform a Join to obtain the winning rows from their IDs. This has the added benefit of isolating the intricacies of this part of the query, making followup user syntax like Select, Join, or OrderBy work, and without breaking the intended plan.

Simplest form

Using index CustomerId, CreationDateTime, [Id].

// Usage
this.DbContext.Orders
	.GroupBy(x => x.CustomerId)
	.Select(group => group.MaxBy(x => x.CreationDateTime));

// Proposed translation to lower expressions
this.DbContext.Orders
	.GroupBy(x => x.CustomerId)
	.Select(group => this.DbContext.Orders
		.Where(x => x.CustomerId == group.CustomerId)
		.OrderByDescending(x => x.CustomerId)
		.ThenByDescending(x => x.CreationDateTime)
		.First().Id)
	.Join(this.DbContext.Orders, id => id, instance => instance.Id, (id, instance) => instance);

-- Approximation of expected SQL

SELECT o.*

FROM (
	SELECT (
		SELECT TOP(1) o.Id
		FROM Orders o
		WHERE o.CustomerId = groups.CustomerId
		ORDER BY o.CustomerId DESC, o.CreationDateTime DESC
	) AS MaxId
	FROM Orders groups
	GROUP BY groups.CustomerId
) AS Maxes

INNER JOIN Orders o ON o.Id = Maxes.MaxId
;

Having the translation always emit a left-complete ordering (including the CustomerId that was made constant by the Where) helps to (A) clarify the intended index and (B) work around a MySQL optimizer bug that, in subqueries, won’t recognize the appropriate index without it.

Notably, in Select(group => [...].MaxBy(x => [...])), MaxBy must be the final expression inside the Select. Attempting something like .Select(group => group.MaxBy(x => x.CreationDateTime).SomeOtherProperty) should result in an untranslatable query. It would prevent us from taking the ID and appending the Join. This constraint should be acceptable: the only use case I can think of is selecting a single property instead of the entire entity. That can still be achieved by adding Select(x => x.SomeOtherProperty at the end of the query.

With complexities

To maximize complexity, we’ll use a composite group key (CustomerId, ShopId) and a composite selector (IsDeleted, CreationDateTime). We’ll also add some conditions, such as one that cuts of a time window using the index, and another that scans over a few mismatching items.

Using index CustomerId, ShopId, IsDeleted, CreationDateTime, [Id].

// Usage
this.DbContext.Orders
	.Where(x => x.CustomerId > 1000)
	.GroupBy(x => new { x.CustomerId, ShopId })
	.Select(group => group
		.Where(x => !x.IsDeleted)
		.Where(x => x.CreationDateTime < new DateTime(2023, 01, 01))
		.Where(x => !x.IsRareExclusion) // Scan over rare exclusions when finding group max (non-indexed)
		.MaxBy(x => new { x.IsDeleted, x.CreationDateTime }));

// Proposed translation to lower expressions
this.DbContext.Orders
	.Where(x => x.CustomerId > 1000)
	.GroupBy(x => new { x.CustomerId, ShopId })
	.Select(group => this.DbContext.Orders
		.Where(x => !x.IsDeleted) // User condition (indexed)
		.Where(x => x.CreationDateTime < new DateTime(2023, 01, 01)) // User condition (indexed)
		.Where(x => !x.IsRareExclusion) // User condition (non-indexed)
		.Where(x => x.CustomerId == group.CustomerId && x.ShopId == group.ShopId) // Group condition
		.OrderByDescending(x => x.CustomerId)
		.ThenByDescending(x => x.ShopId)
		.ThenByDescending(x => x.IsDeleted)
		.ThenByDescending(x => x.CreationDateTime)
		.First().Id)
	.Join(this.DbContext.Orders, id => id, instance => instance.Id, (id, instance) => instance);

-- Approximation of expected SQL

SELECT o.*

FROM (
	SELECT (
		SELECT TOP(1) o.Id
		FROM Orders o
		WHERE o.IsDeleted = 0
		AND o.CreationDateTime < '2023-01-01'
		AND o.IsRareExclusion = false
		AND o.CustomerId = groups.CustomerId AND o.ShopId = group.ShopId
		ORDER BY o.CustomerId DESC, o.ShopId DESC, o.IsDeleted DESC, o.CreationDateTime DESC
	) AS MaxId
	FROM Orders groups
	WHERE groups.CustomerId > 1000
	GROUP BY groups.CustomerId
) AS Maxes

INNER JOIN Orders o ON o.Id = Maxes.MaxId
;

1reaction

Timovzlcommented, Jan 17, 2023

That sounds odd; I’d carefully test [the claim of subqueries having a greater constant overhead than joins] and share concrete, comparative queries and their plans.

You’re right. Based on further testing, I can now say that my earlier claims regarding subqueries having a greater constant overhead certainly do not apply to SQL Server. It works as you might expect: it interprets the query to understand what you want, and produces a plan. Whether you expressed the query as a join or a dependent subquery makes little difference to it.

I was originally trained on MySQL 5.6, so it is very possible that what I claimed applies to that alone. I have not yet checked the behavior on MySQL 8, although I’m quite curious.