question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

JOIN instead of CROSS APPLY in generated query in SQL Server

See original GitHub issue

EF Core Preview 5 would generate CROSS APPLY from a linq query like this:

from navObject in Context.NavObjects
join vessel in Context.Vessels on navObject.VesselId equals vessel.VesselId
from passage in Context.Passages
    .Where(x => x.VesselId == navObject.VesselId && x.ActualDepartureTime.Value <= fromTime)
    .OrderByDescending(x => x.ActualDepartureTime)
    .Take(1)
    .DefaultIfEmpty()

The generated query would be:

SELECT ... FROM [NavObject] AS [no]
INNER JOIN [Vessel] AS [vessel] ON [no].[ObjectId] = [vessel].[ObjectId]
CROSS APPLY (
    SELECT TOP(1) [x].*
    FROM [Passage] AS [x]
    WHERE ([x].[ObjectId] = [no].[ObjectId]) AND ([x].[ActualDepartureTime] <= @__fromTime_1)
    ORDER BY [x].[ActualDepartureTime] DESC
) AS [t]

In RC1 the query contains JOINs from SELECTs from SELECTs which cause where bad performance and timeouts:

SELECT ... FROM [NavObject] AS [n]
INNER JOIN [Vessel] AS [v] ON [n].[ObjectId] = [v].[ObjectId]
INNER JOIN (
    SELECT [t].....
    FROM (
        SELECT [p]...., ROW_NUMBER() OVER(PARTITION BY [p].[ObjectId] ORDER BY [p].[ActualDepartureTime] DESC) AS [row]
        FROM [Passage] AS [p]
        WHERE ([p].[ActualDepartureTime] <= @__fromTime_1)
    ) AS [t]
    WHERE [t].[row] <= 1
) AS [t0] ON [n].[ObjectId] = [t0].[ObjectId]

As you can clearly see, the Preview 5 generated query is clear and effective while the RC1 generated query is off. Please fix this query generation pattern.

Further technical details

EF Core version: 3.0 RC1 (versus 3.0 Preview 5) Database provider: Microsoft.EntityFrameworkCore.SqlServer Target framework: .NET Core 3.0 Operating system: Windows 10 IDE: Visual Studio 2019 16.2.5

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:23
  • Comments:69 (31 by maintainers)

github_iconTop GitHub Comments

7reactions
MoMack20commented, Apr 4, 2023

Ran into an instance of this in production. After a table hit a certain number of records the query plan changed and it went from a 300 ms query to a 3 minute query.

Over the last couple of years of the observing this query my team has had to make multiple changes that differ from the implementation that should work without issue.

Standard query I expect to work performantly.

await ctx.MainEntitity
	.Where(me => me.StatusId < 6)
	.Select(me => new
	{
		me.Id,
		me.Name,
		RelatedEntityInfo = me.RelatedEntity
			.Select(re => new
			{
				re.MainEntityId,
				re.DateCreated,
				re.Status
			}).FirstOrDefault()
	}).ToListAsync()

Modified query that was performing well until the issue in production yesterday.

await ctx.MainEntitity
	.Where(me => me.StatusId < 6)
	.Select(me => new
	{
		me.Id,
		me.Name,
		RelatedEntityInfo = me.RelatedEntity
			.Where(re => re.MainEntitityId == me.Id)
			.Select(re => new
			{
				re.MainEntityId,
				re.DateCreated,
				re.Status
			}).FirstOrDefault()
	}).ToListAsync()

Current query that solved the performance issues. This way forces the filter to be in the same block instead of using the windowed function

await ctx.MainEntitity
	.Where(me => me.StatusId < 6)
	.Select(me => new
	{
		me.Id,
		me.Name,
		RelatedEntityInfo = me.RelatedEntity
			.Where(re => re.MainEntitityId == me.Id)
			.Select(re => new
			{
				re.MainEntityId,
				re.DateCreated,
				re.Status
			}).Take(1)
			.FirstOrDefault()
	}).ToListAsync()

Here is a graph the performance impact of when the query went sour and when the last change was put into place. image

Here is the SQL query given for the EF query without the “Take” statement.

LEFT JOIN (
    SELECT [t1].[Status], [t1].[DateCreated], [t1].[c], [t1].[MainEntitityId]
    FROM (
        SELECT [r].[Status], [r].[DateCreated], 1 AS [c], [r].[MainEntitityId], ROW_NUMBER() OVER(PARTITION BY [r].[MainEntitityId], [r].[MainEntitityId] ORDER BY [r].[Id]) AS [row]
        FROM [dbo].[RelatedEntity] AS [r]
    ) AS [t1]
    WHERE [t1].[row] <= 1
) AS [t0] ON [p1].[Id] = [t0].[MainEntitityId] AND [t].[MainEntitityId] = [t0].[MainEntitityId]

Here is the SQL query given for the EF query with the “Take” statement.

OUTER APPLY (
    SELECT TOP(1) [t1].[MainEntitityId], [t1].[Status], [t1].[DateCreated], 1 AS [c]
    FROM (
        SELECT TOP(1) [r].[MainEntitityId], [r].[Status], [r].[DateCreated]
        FROM [dbo].[RelatedEntity] AS [r]
        WHERE ([p1].[Id] IS NOT NULL) AND [p1].[Id] = [r].[MainEntitityId] AND [t].[MainEntitityId] = [r].[MainEntitityId]
    ) AS [t1]
) AS [t0]

Using the Take(1) before FirstOrDefault() seems to force the OUTER APPLY and the filtering to be done inside of the same block with the select, as opposed to the OUTER JOIN with the filter done outside of the same block as the select.

5reactions
dmitry-slabkocommented, Dec 6, 2019

Ok, some more input on this problem. Here is the linq:

from vessel in Context.Vessels.Where(...)
from position in Context.Positions
    .Where(t => t.VesselId == vessel.VesselId && t.Time <= fromTime)
    .OrderByDescending(s => s.Time)
    .Take(1)
    .DefaultIfEmpty()
select new LocationPoint ...

The meaning is to get the latest point for each vessel id. In 3.0 Preview 5 this would generate such SQL:

SELECT ... FROM [Vessel] AS [v]
CROSS APPLY (
    SELECT [t3].*
    FROM (
        SELECT NULL AS [empty]
    ) AS [empty1]
    LEFT JOIN (
        SELECT TOP(1) [p0].*
        FROM [Position] AS [p0]
        WHERE ([p0].[ObjectId] = [v].[ObjectId]) AND ([p0].[Time] <= @__fromTime_4)
        ORDER BY [p0].[Time] DESC
    ) AS [t3] ON 1 = 1
) AS [t4]

The subquery to retrieve data from Position is effectively filtered.

Now, since Preview 5 and until 3.1 release, the query is such:

SELECT ... FROM [Vessel] AS [v]
LEFT JOIN (
    SELECT ...
    FROM (
        SELECT ..., ROW_NUMBER() OVER(PARTITION BY [p].[ObjectId] ORDER BY [p].[Time] DESC) AS [row]
        FROM [Position] AS [p]
        WHERE [p].[Time] <= @__fromTime_1
    ) AS [t]
    WHERE [t].[row] <= 1
) AS [t0] ON [v].[ObjectId] = [t0].[ObjectId]

And this is the problem - the inner subquery retrieves all rows from Position table, and in our case it is 16+ million rows, which may even be much more for some other customers. However, the subquery is executed for each row in the master query. So, it appears that the use of partitioned queries for MS SQL was based on wrong assumptions, as this pattern generates queries that will not perform quite well even on small data sets, while on large data sets they simply kill the reader.

I cannot say how this pattern behaves on other servers, such as PosgreSQL and Oracle, but for MS SQL it is not applicable. I would highly recommend to change the query generation pattern for such linq expressions back to what it was up until 3.0 Preview 5.

Read more comments on GitHub >

github_iconTop Results From Across the Web

When should I use CROSS APPLY over INNER JOIN?
Here's how it works. The query inside CROSS APPLY can reference the outer table, where INNER JOIN cannot do this (it throws compile...
Read more >
SQL Server CROSS APPLY and OUTER APPLY
Microsoft SQL Server 2005 introduced the APPLY operator, which is like a join clause and it allows joining between two table expressions i.e. ......
Read more >
INNER JOIN vs. CROSS APPLY at EXPLAIN EXTENDED
In SQL Server, while most queries which employ CROSS APPLY can be rewritten using an INNER JOIN, CROSS APPLY can yield better execution...
Read more >
The Difference between CROSS APPLY and OUTER ...
The CROSS APPLY operator is semantically similar to INNER JOIN operator. It retrieves those records from the table valued function and the table ......
Read more >
Understanding SQL Server CROSS APPLY and OUTER ...
Thus, the CROSS APPLY is similar to an INNER JOIN, or, more precisely, like a CROSS JOIN with a correlated sub-query with an...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found