question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Poorly performing query when selecting parent entity from filtered child entities

See original GitHub issue

Ask a question

I’m trying to pull data for an entity, by filtering it’s related entities. This is like the v5 include filtering, except I actually want the entire query filtered by it vs just the child entities that are returned.

The query that’s generated is terribly inefficient when doing this. What can I do to get a better generated query for this, while also getting the results I want in a single query?

Include your code

Models

(Somewhat simplified for this purpose)

    public class MineGroup : EquatableModel
    {
        public long Id { get; set; }

        public HashSet<Mine> _mines;
        public IEnumerable<Mine> Mines => _mines?.ToList();
    }

    public class Mine
    {
        public long Id { get; set; }

        public long MineGroupId { get; private set; }
        public MineGroup MineGroup { get; private set; }

        public float Latitude { get; set; }
        public float Longitude { get; set; }
    }

EF Core Query

The objective here is to get all MineGroup where their mines exist within the specified bounds, and have the Mines collection populated.

        MineGroup[] groups = await _context.Mines
                .Where(x => x.Longitude >= bbox.West)
                .Where(x => x.Longitude <= bbox.East)
                .Where(x => x.Latitude <= bbox.North)
                .Where(x => x.Latitude >= bbox.South)
                .Include(x => x.MineGroup)
                    .ThenInclude(x => x.Mines)
                .Select(x => x.MineGroup)
                .AsNoTracking()
                .ToArrayAsync();

Generated SQL

I actually have ~500 mines that are in this area. However, this query has a result set of 100k

      SELECT "m0"."Id", "m"."Id", "m1"."Id", "m1"."Latitude", "m1"."Longitude", "m1"."MineGroupId"
      FROM "Mines" AS "m"
      INNER JOIN "MineGroups" AS "m0" ON "m"."MineGroupId" = "m0"."Id"
      LEFT JOIN "Mines" AS "m1" ON "m0"."Id" = "m1"."MineGroupId"
      WHERE ((("m"."Longitude" >= -119.13574) AND ("m"."Longitude" <= -119.11377)) AND ("m"."Latitude" <= 43.755226)) AND ("m"."Latitude" >= 43.739353)
      ORDER BY "m"."Id", "m0"."Id", "m1"."Id"

Include provider and version information

EF Core version: 5.0.3 Database provider: Sqlite Target framework: .Net 5 Operating system: Windows 10 IDE: VS 2019 16.8.5

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
rojicommented, Jun 9, 2021

@douglasg14b I took a look at this as well… I agree with @smitpatel’s comment above that your LINQ query does seem to correspond to what you’re try to do (“get all MineGroup where their mines exist within the specified bounds, and have the Mines collection populated”), and that the SQL EF generates seems correct as well. But I think we have a slight communication disconnect here.

I actually have ~500 mines that are in this area. However, this query has a result set of 100k

So importantly, as you’ve written it, your query returns all MineGroups which contain at least one mine in your bbox, but for each of these groups it pull back all mines when population the Mines collection. That means it returns mines which are not in your original bbox, simply because they’re in a MineGroup with another mine that is in the bbox. That probably explains why you’re seeing a lot more mines coming back than you expect.

If what you want is to get back MineGroups with their Mines collection populated, but only with mines in the bbox, then you need to repeat your filter in the filtered include as well:

var groups = await ctx.Mines
    .Where(m => m.Longitude >= bbox.West && m.Longitude <= bbox.East && m.Latitude <= bbox.North && m.Latitude >= bbox.South)
    .Include(x => x.MineGroup)
    .ThenInclude(x => x.Mines
        .Where(m => m.Longitude >= bbox.West && m.Longitude <= bbox.East && m.Latitude <= bbox.North && m.Latitude >= bbox.South))
    .Select(x => x.MineGroup)
    .AsNoTracking()
    .ToArrayAsync();

SQL:

SELECT [m0].[Id], [m].[Id], [t].[Id], [t].[Latitude], [t].[Longitude], [t].[MineGroupId]
FROM [Mines] AS [m]
INNER JOIN [MineGroups] AS [m0] ON [m].[MineGroupId] = [m0].[Id]
LEFT JOIN (
    SELECT [m1].[Id], [m1].[Latitude], [m1].[Longitude], [m1].[MineGroupId]
    FROM [Mines] AS [m1]
    WHERE ((([m1].[Longitude] >= @__bbox_West_0) AND ([m1].[Longitude] <= @__bbox_East_1)) AND ([m1].[Latitude] <= @__bbox_North_2)) AND ([m1].[Latitude] >= @__bbox_South_3)
) AS [t] ON [m0].[Id] = [t].[MineGroupId]
WHERE ((([m].[Longitude] >= @__bbox_West_0) AND ([m].[Longitude] <= @__bbox_East_1)) AND ([m].[Latitude] <= @__bbox_North_2)) AND ([m].[Latitude] >= @__bbox_South_3)
ORDER BY [m].[Id], [m0].[Id], [t].[Id]

If that seems unnecessary, consider that in other scenarios the two things - which groups are returned, and which mines are populated in their mines lists - really are separate. You may want all groups in the database, but with only bounded mines in their collections (so some groups would have 0 mines), or you may want only groups which have at least one bounded mine, but with all mines populated (which is what your query does above).

I hope this helps explain why your query behaves the way it does, and what you may need to do to get the behavior you want.

0reactions
smitpatelcommented, Jun 9, 2021

The objective here is to get all MineGroup where their mines exist within the specified bounds, and have the Mines collection populated.

The LINQ query you posted above corresponds to that and generated SQL is accurate SQL for it. Having only 500 mines within specified bounds has no connection to the result set size of SQL or LINQ query based on above.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How do I filter the child entities within a parent entity based ...
Consider using Any() to achieve that conditional: var tmpJob = context.Jobs .Include(j => j.Attachments) .Where(j => j.Id == 16 && j.
Read more >
Efficient SQL query to return parents that satisfy constraints ...
If you do need to improve performance, you can consider a rewrite similar to the following: select * from Entity e WHERE e.id...
Read more >
How to query parent rows when all children must match ...
Learn how to query and fetch parent rows when all associated child entries match the provided filtering criteria using both SQL and ...
Read more >
Everyday Salesforce Patterns: Filtering Parent Objects By ...
Sometimes, we need to filter an Account query by its Contacts , or some custom object Project__c by its associated Subject_Area__c records.
Read more >
5 Common Hibernate Mistakes That Cause Dozens of ...
JPA and Hibernate get often criticized for executing much more queries than expected. That's often caused by a few mistakes that you can...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found