AOT query mode: precompiled queries (1st part of the query pipeline)
See original GitHub issueRemaining sub-tasks
- ExecuteUpdate/Delete
- SQL queries
- Split query
- GroupBy final operator (materializer work)
- Shared-type entity types (in various places where we do entity lookups)
Bugs and edge cases
General design
When a LINQ query is first encountered, EF “compiles” it, producing a code-generated shaper, SQL (for relational databases), etc. This process is both a bit long (increasing startup times), and incompatible with AOT environments (since code generation is used at runtime). While several approaches have been discussed in the past to improve this (e.g. #16496), with the advent of source generators we have some new possibilities. I’ve done some work on a proof-of-concept source generator which identifies EF queries and precompiles them; the work is far from complete but indicates that the approach is feasible.
In a nutshell, we would:
- Identify a query in user source code
- A first implementation would identify invocations of EF’s compiled query API (EF.CompileQuery); this is trivial and low-risk way to immediately identify EF queries in the user’s code.
- We could later also attempt to precompile regular queries which don’t use EF.CompileQuery. This would be an additional step in which we identify DbSets (as member accesses on a DbContext-typed identfier), and then walk up the syntax tree, progressively including methods as long as they accept IQueryable. Once we reach a method which doesn’t accept IQueryable (e.g. ToList), we’ve reached the end of the query to be compiled.
- Dynamically-constructed queries wouldn’t be supported.
- Transform the query to a LINQ expression tree
- Once we have a Roslyn syntax tree representing a query (either from EF.CompileQuery or from a regular query), it needs to be transformed into a LINQ expression tree, which is what EF’s query pipeline requires.
- Unlike the Roslyn structures, LINQ expression trees refer to actual .NET types, MemberInfos, etc. We would therefore need to load the user’s assembly (from the input compilation given to the source generator), and use reflection to load actual types from it (e.g. entity CLR types). See note on AssemblyLoadContext below.
- Compile the query with EF Core
- Once we have a LINQ expression tree, we need to pass it to EF’s query compiler. To do this:
- We instantiate the user’s DbContext type, using the parameterless constructor
- Extract the IQueryCompiler service from it
- Invoke the compiler, passing it the LINQ expression tree.
- The output of this compilation is another LINQ expression tree, which instantiates e.g. a SingleQueryingEnumerable given a QueryContext. This output tree must not contain any compiled elements, e.g. the shaper must be present in non-compiled form. This would require some refactoring of the last parts of the query pipeline.
- Once we have a LINQ expression tree, we need to pass it to EF’s query compiler. To do this:
- Generate C# out of the compilation output
- In the normal flow, the output LINQ expression tree is now compiled to produce a lambda (returning e.g. an enumerable given a QueryContext).
- In the AOT flow, the expression tree would instead be outputted as C# code into a file emitted by the source generator. This generated code would be invoked by EF as part of startup, and would pre-populate its query cache.
- This would require writing a component to convert a LINQ expression tree to C# code - possibly passing through a Roslyn syntax tree for maximum flexibility etc…
The final code added by the source generator would look something like the following:
var selectExpression = ...;
var readColumns = ...;
var relationalCommandCache = new RelationalCommandCache(
memoryCache,
querySqlGeneratorFactory,
RelationalParameterBasedSqlProcessFactory,
selectExpression,
readColumns,
useRelationalNulls: false
);
var shaper = ...;
var enumerable = new SingleQueryingEnumerable<Blog>(
(RelationalQueryContext)QueryCompilationContext.QueryContextParameter,
relationalCommandCache,
shaper,
typeof(Blog),
standAloneStateManager: false,
detailedErrorsEnabled: false,
threadSafetyChecksEnabled: true);
// Pre-populate EF Core's cache with the above enumerable
Additional notes:
- The above does not cover relational command caching (including SQL), which depends on parameter nullability. This means that some query compilation still remains at runtime (but no code generation).
- We may be able to reuse previously-precompiled queries if their source file hasn’t change (e.g. store file hashes). This would make this feature suitable also for speeding up the developer inner loop.
- Query precompilation isn’t necessarily dependent on using compiled models (#1906), though using that would speed the process up.
- This could be helpful (thanks @bricelam)
EDIT: Following internal discussion it has become clear that doing this as a source generator isn’t practical (see https://github.com/dotnet/efcore/issues/25009#issuecomment-853735001 below). Instead, this would be a design-time CLI command or similar.
- ~This would most likely be opt-in-only (via a csproj property), and probably makes most sense in Release builds.~
- ~When loading user assemblies (and their dependents), we probably want to isolate them in their own AssemblyLoadContext. This isn’t trivial - we need to take Roslyn-provided syntax tree and semantic models (default assembly loader), transform them into an expression tree, and pass that into the query pipeline isolated inside the special AssemblyLoadContext. In my prototype, the default AssemblyLoadContext is used to avoid these issues.~
Issue Analytics
- State:
- Created 2 years ago
- Reactions:4
- Comments:5 (5 by maintainers)
@AndriySvyryd yeah. To sum up an internal conversation with @jaredpar:
So yeah, we’ll probably go with a design-time tool (e.g. CLI command). The general plan outlined above should still apply to that (and the need for isolating the user assemblies is no longer relevant).
Alternatively instead of implementing this as a source generator it could be a design-time tool that uses Roslyn and avoids the issues related to loading user assemblies and resolving types used in queries.