Eager Loading Many to Many Relationship Doesn't Use the Index
See original GitHub issueFile a bug
Okay, so I’m using SQLite. That might matter because DBMS’s are pointlessly different just for the sake of it.
I have a many to many relationship declared using List<Entity> in each class. That generally works, but I know it’s newer.
EF Core generates a query that SQLite doesn’t use an index search for, and instead it uses a scan, which is slow.
Include your code
public class Image
{
public int ImageId { get; set; }
public int Width { get; set; }
public int Height { get; set; }
public virtual List<ImageTag> Tags { get; set; }
}
public class ImageTag
{
public int ImageTagId { get; set; }
public string Name { get; set; }
public string Description { get; set; }
public string Type { get; set; }
[IgnoreDataMember]
public virtual List<Image> Images { get; set; }
}
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
// nullability
modelBuilder.Entity<Image>().Property(a => a.Width).IsRequired();
modelBuilder.Entity<Image>().Property(a => a.Height).IsRequired();
modelBuilder.Entity<ImageTag>().Property(a => a.Name).IsRequired();
// indexes
modelBuilder.Entity<Image>().HasIndex(a => a.Width);
modelBuilder.Entity<Image>().HasIndex(a => a.Height);
modelBuilder.Entity<ImageTag>().HasIndex(a => a.Name).IsUnique();
modelBuilder.Entity<ImageTag>().HasIndex(a => a.Type);
}
Query
var namesToLookup = new List<string> { "a", "b", "c" };
var existingTags = await Set<ImageTag>().Include(a => a.Images)
.Where(a => namesToLookup.Contains(a.Name)).ToListAsync(token);
So that generates a query like so:
SELECT "i"."ImageTagId", "i"."Description", "i"."Name", "i"."Type", "t"."ImagesImageId", "t"."TagsImageTagId", "t"."ImageId", "t"."Height", "t"."Width"
FROM "ImageTags" AS "i"
LEFT JOIN (
SELECT "i0"."ImagesImageId", "i0"."TagsImageTagId", "i1"."ImageId", "i1"."Height", "i1"."Width"
FROM "ImageImageTag" AS "i0"
INNER JOIN "Images" AS "i1" ON "i0"."ImagesImageId" = "i1"."ImageId"
) AS "t" ON "i"."ImageTagId" = "t"."TagsImageTagId"
WHERE "i"."Name" IN ('a', 'b', 'c')
ORDER BY "i"."ImageTagId", "t"."ImagesImageId", "t"."TagsImageTagId", "t"."ImageId"
Which explains to this
MATERIALIZE 1
SCAN TABLE ImageImageTag AS i0
SEARCH TABLE Images AS i1 USING INTEGER PRIMARY KEY (rowid=?)
SEARCH TABLE ImageTags AS i USING INDEX IX_ImageTags_Name (Name=?)
SEARCH SUBQUERY 1 AS t USING AUTOMATIC COVERING INDEX (TagsImageTagId=?)
USE TEMP B-TREE FOR ORDER BY
And because of the SCAN TABLE, it’s really slow, like minutes, which would be okay if it wasn’t a really easy fix.
SELECT "i"."ImageTagId", "i"."Description", "i"."Name", "i"."Type", "i0"."ImagesImageId", "i0"."TagsImageTagId", "i1"."ImageId", "i1"."Height", "i1"."Width"
FROM "ImageTags" AS "i"
INNER JOIN "ImageImageTag" AS "i0" ON "i0"."TagsImageTagId" = "i"."ImageTagId"
INNER JOIN "Images" AS "i1" ON "i0"."ImagesImageId" = "i1"."ImageId"
WHERE "i"."Name" IN ('a', 'b', 'c')
ORDER BY "i"."ImageTagId", "i0"."ImagesImageId", "i0"."TagsImageTagId", "i1"."ImageId"
Explains to
SEARCH TABLE ImageTags AS i USING INDEX IX_ImageTags_Name (Name=?)
SEARCH TABLE ImageImageTag AS i0 USING INDEX IX_ImageImageTag_TagsImageTagId (TagsImageTagId=?)
SEARCH TABLE Images AS i1 USING INTEGER PRIMARY KEY (rowid=?)
USE TEMP B-TREE FOR ORDER BY
That’s fast. SQLite Studio just says [00:28:36] Query finished in 0.000 second(s).
So this issue is twofold.
- Is it like that for a (good) reason?
- Is there a workaround that doesn’t require me to manually define my intermediate tables or execute raw sql?
I have a several hundred gig database already built, so I’m not running a migration to define the tables myself to then try to manually build a better query in LINQ. I’m not running raw sql because that defeats the purpose. If I was willing to hardcode sql, I wouldn’t be using an ORM with so many known quirks. No offense, it’s still by far the best option.
Sorry for not following the template exactly, but I’ve seen this reproduced in other issues. They just didn’t complain about it, which makes me think it may be an SQLite <-> EFCore quirk. It’s not a specific bug, but more of a logic issue.
Include provider and version information
EF Core version: 6.0.0-preview4… System.Data.Sqlite.Core is the same version Database provider: SQLite Target framework: net5.0
I would be willing to use net6.0, but this app runs on linux, and installing the net6.0 preview tools for linux is still a pita.
Issue Analytics
- State:
- Created 2 years ago
- Comments:19 (8 by maintainers)
This is something out-of-scope for EF to do automatically as determining whether this optimization would be appropriate for a given query and model is not trivial and might depend on the data and user’s intent.
For future peeps that may come across this, I used the following code:
To generate the following SQL:
The grouping, model relationship mapping, and context attaching needs to be post processed at the end, but it’s still much faster than trying to run the subquery in my case. EDIT: After letting it run against real data for a few hours, it went from an average of 4:12s (rounding) to <1s. The mapping table in question is several hundred thousand records.
Because of the inaccessibility of the intermediate table, I think this could be included in EF with some optimization and cleanup. I can’t think of any situation where the first left join would not be null without the second left join matching, as only EF can make them when a user provides a mapping of Image to ImageTag.