Allow users to express SQL row value comparison syntax
See original GitHub issueRow value is part of the SQL standard -though not supported by every database in all places- and it allows using the following syntax (for example in a where clause):
WHERE (timestamp, id) > ('2019-03-10 00:00:00+00:00', 1032749689)
This is in particular a highly valuable syntax when doing keyset/seek/cursor pagination, and being able to use it with databases that support it will be a big win for seek pagination in particular, but also composite conditions that can gain from this syntax in general.
The row value syntax above is equivalent, in theory, to something like this:
WHERE
(timestamp > '2019-03-10 00:00:00+00:00') OR
(timestamp = '2019-03-10 00:00:00+00:00' AND id > 1032749689)
But, for one, this logic condition becomes more complex the more columns you have. In addition, the db implementation of row value will usually be much more efficient than this (there are a few things you can do that can help the db engine in the above condition, but it doesn’t look nice or readable).
More info about row value here: https://use-the-index-luke.com/sql/partial-results/fetch-next-page#sb-row-values
My use case is a package I worked on that fully implements keyset pagination with EF Core, but does so by building the expressions so that it translates to the logical condition syntax above. I could use a row value translator.
For a related issue to keyset pagination see this: https://github.com/dotnet/efcore/issues/9115
Proposed solution
What I would love to see is a db function that EF Core understands and translates into the row value syntax for db providers that support it. Something like this:
var timestamp = ...;
var id = 1032749689;
// One option:
.Where(b => EF.Functions.GreaterThan(new[] { b.Column1, b.Column2 }, new[] { 1, 2 }))
// Another option with a tuple override:
.Where(b => EF.Functions.GreaterThan((b.Column1, b.Column2), new[] { 1, 2 }))
// Translates to sql:
// WHERE (timestamp, id) > ('2019-03-10 00:00:00+00:00', 1032749689)
One thing I’m not sure of is if there’s a system in place that would allow a consumer to know if the current database provider supports this syntax or not. Because I want to be able to fallback to my implementation in such a case.
If this seems like an interesting and feasible addition, I’m willing to work on it.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:13
- Comments:29 (14 by maintainers)
I’ve investigated this, and yes I’m sure of at least one perf benefit. In PostgreSQL, I created a table with a million records, and I have an index on created + id. I’m doing a simple seek + limit on the 2 columns. Here’s the execution plan for a logical condition (
x > a OR x = a AND y > b
):There’s an Index Scan with a filter operation.
In contrast, here’s the execution plan when using row value instead:
The difference stems from the fact that no db will be able to realize that it can use an access predicate on the 1st column when using the logical condition, but it understands exactly what access predicates to use when using row value. One way to force all dbs to recognize this without using row value is to add a redundant clause as following:
The first line above results in the db using an access predicate before filtering:
I’m not sure if this exactly matches in perf a db implemented row value for 2 columns, but it definitely won’t match it for more than 2 columns (it’ll be harder to keep generating these access predicate hints in the logic conditions properly then).
(There’s one alternative form to do this without using a redundant clause, but it’s much much harder for a human to understand.)
My test table is extremely simple (which is also why the filter had nothing to do after the access predicate), I’m sure the perf benefit will be more apparent in the real world.
I want to emphasize again how much easier it is to form the row value syntax than a generalized logical condition when we’re dealing with more than a few columns. Granted, it’s rare to have a query doing this over more than 2-3 columns, but anyway, the generalized condition expression for this is:
It’s very easy to get wrong (and a horror to maintain).
This is great. Trying to translate into row value from a db function instead of pattern matching will obviously be much easier too.
Also, when using efcore.pg, I’m wondering (haven’t checked much of the code) if this will pick up the generated condition expressions I dynamically form in my package, since there’s the additional optimization I have with the access predicate clause which might throw your pattern matcher off. (But with a quick look, it seems that it needs a very specific form when there’s 3 columns, which wouldn’t work with the generalized logical condition, and no support for more than that?).
In conclusion, I do think it has performance benefits as the db can optimize the execution because the intent is clear, and it would still be much easier to write in C# (even if uglier than others) than forming the error-prone logical condition ourselves.
I think there’s also a lot to gain if the EFCore db provider that’s being used automatically either choses to translate it to row value when supported and applicable, or form the logical condition otherwise (for reference, I do this here).
Good suggestion - we should definitely keep this in mind if/when we get around to implementing this for SQL Server (and analyze the perf to make sure).
Beyond the perf aspect, hopefully this improvement (currently in PG only) helps people adopt keyset pagination as it removes the need to deal with the complex expanded comparisons…