Poorly performing query when selecting parent entity from filtered child entities
See original GitHub issueAsk a question
I’m trying to pull data for an entity, by filtering it’s related entities. This is like the v5 include filtering, except I actually want the entire query filtered by it vs just the child entities that are returned.
The query that’s generated is terribly inefficient when doing this. What can I do to get a better generated query for this, while also getting the results I want in a single query?
Include your code
Models
(Somewhat simplified for this purpose)
public class MineGroup : EquatableModel
{
public long Id { get; set; }
public HashSet<Mine> _mines;
public IEnumerable<Mine> Mines => _mines?.ToList();
}
public class Mine
{
public long Id { get; set; }
public long MineGroupId { get; private set; }
public MineGroup MineGroup { get; private set; }
public float Latitude { get; set; }
public float Longitude { get; set; }
}
EF Core Query
The objective here is to get all MineGroup
where their mines exist within the specified bounds, and have the Mines
collection populated.
MineGroup[] groups = await _context.Mines
.Where(x => x.Longitude >= bbox.West)
.Where(x => x.Longitude <= bbox.East)
.Where(x => x.Latitude <= bbox.North)
.Where(x => x.Latitude >= bbox.South)
.Include(x => x.MineGroup)
.ThenInclude(x => x.Mines)
.Select(x => x.MineGroup)
.AsNoTracking()
.ToArrayAsync();
Generated SQL
I actually have ~500 mines that are in this area. However, this query has a result set of 100k
SELECT "m0"."Id", "m"."Id", "m1"."Id", "m1"."Latitude", "m1"."Longitude", "m1"."MineGroupId"
FROM "Mines" AS "m"
INNER JOIN "MineGroups" AS "m0" ON "m"."MineGroupId" = "m0"."Id"
LEFT JOIN "Mines" AS "m1" ON "m0"."Id" = "m1"."MineGroupId"
WHERE ((("m"."Longitude" >= -119.13574) AND ("m"."Longitude" <= -119.11377)) AND ("m"."Latitude" <= 43.755226)) AND ("m"."Latitude" >= 43.739353)
ORDER BY "m"."Id", "m0"."Id", "m1"."Id"
Include provider and version information
EF Core version: 5.0.3 Database provider: Sqlite Target framework: .Net 5 Operating system: Windows 10 IDE: VS 2019 16.8.5
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (5 by maintainers)
@douglasg14b I took a look at this as well… I agree with @smitpatel’s comment above that your LINQ query does seem to correspond to what you’re try to do (“get all MineGroup where their mines exist within the specified bounds, and have the Mines collection populated”), and that the SQL EF generates seems correct as well. But I think we have a slight communication disconnect here.
So importantly, as you’ve written it, your query returns all MineGroups which contain at least one mine in your bbox, but for each of these groups it pull back all mines when population the Mines collection. That means it returns mines which are not in your original bbox, simply because they’re in a MineGroup with another mine that is in the bbox. That probably explains why you’re seeing a lot more mines coming back than you expect.
If what you want is to get back MineGroups with their Mines collection populated, but only with mines in the bbox, then you need to repeat your filter in the filtered include as well:
SQL:
If that seems unnecessary, consider that in other scenarios the two things - which groups are returned, and which mines are populated in their mines lists - really are separate. You may want all groups in the database, but with only bounded mines in their collections (so some groups would have 0 mines), or you may want only groups which have at least one bounded mine, but with all mines populated (which is what your query does above).
I hope this helps explain why your query behaves the way it does, and what you may need to do to get the behavior you want.
The LINQ query you posted above corresponds to that and generated SQL is accurate SQL for it. Having only 500 mines within specified bounds has no connection to the result set size of SQL or LINQ query based on above.