Read arbitrary trees of arrays of composite types, with specific type handlers for inner elements
See original GitHub issueFor https://github.com/npgsql/efcore.pg/issues/1691, we need a way to efficiently read a column of type record[]
which can potentially represent a unique structure for every query. It more or less deals with the same issue as unmapped composite types, but in this case the plain record
oid is used.
Currently, a .NET object[][]
is returned for PG record[]
, which is not good enough in case some element has a specific type mapping for example when using EF.
Note that an item in a record can contain another record, or array of records, of an arbitrary depth.
There are many approaches to deserialize this. Here are some options:
- Some reflection-based approach that takes a
Type
as input and constructs the given object.
public class MyType {
public int Thing { get; set; }
public InnerType[] MoreThings { get; set; }
}
public class InnerType {
public NpgsqlTimestamp Timestamp { get; set; }
public int[] IntArray { get; set; }
}
MyType[] records = reader.GetFieldValue<MyType[]>(index);
I believe this was kind of supported in Npgsql 4 (but not inner types), but was dropped.
- A communication API for moving the deserializer forward, step by step. Something like this:
// in NpgsqlDataReader:
public void StartDeserializeComplexType(int columnIndex);
public int StartDeserializeArray(); // returns the number of items in the array that follows
public int StartDeserializeRecord(); // returns the number of items in the record that follows
public T DeserializeElement<T>(); // uses the standard type handler for reading the item
The last three methods throw if the deserialization state doesn’t match what the user expects.
We can then use it like this:
reader.StartDeserializeComplexType(index);
MyType[] records = new MyType[reader.StartDeserializeArray()];
for (var i = 0; i < records.Length; i++) {
reader.StartDeserializeRecord();
MyType record = new MyType();
record.Thing = reader.DeserializeElement<int>();
record.MoreThings = new InnerType[reader.StartDeserializeArray()];
for (var j = 0; j < record.MoreThings.Length; j++) {
reader.StartDeserializeRecord();
InnerType inner = new InnerType();
inner.Timestamp = reader.DeserializeElement<NpgsqlTimestamp>();
inner.IntArray = reader.DeserializeElement<int[]>(); // can do this the easy way
record.MoreThings[j] = inner;
}
records[i] = record;
}
Or let StartDeserializeComplexType
return a new object having the methods above, to avoid cluttering NpgsqlDataReader
, at the expense of an extra allocation (which however could be cached…).
- Some visitor with callback API, so the user gets a callback when we enter and leave records and arrays etc.
While option 1 is probably easiest to use for normal users, it has performance drawbacks (due to reflection), as well as the potential unability to map record items, which are unnamed, to the correct property (since properties are unordered in C#, if not annotated to be laid out in a particular order). Also if the returned object tree is only for intermediate use, we waste memory.
Option 2 is in my opinion a better fit for how EF materialization works, as it dynamically generates .NET Expressions which will be compiled to a real function. The materialization code can just generate the (verbose) needed code that iterates through the record tree and builds whatever entity objects it wants directly, without any boxing or any creating a bunch of new .NET Types.
Also note that 1 can easily be created as a helper function using 2 if we would want that.
Comments?
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (10 by maintainers)
Posted at https://github.com/npgsql/npgsql/pull/3604.
Note that every PostgreSQL table has an implicit composite type which describes its rows; Npgsql doesn’t load these “table composites” by default - since there could be a huge number of tables - but you can turn that on with
Load Table Composites=true
on the connection string. But this indeed only covers projecting out entities, as opposed to anonymous types as in your code sample above. I definitely don’t think EF (or the user) should create composite types per-query - records do seem like the way to go.I guess I’m proposing to defer the Npgsql ADO changes (how to read a record) until we’re further along on the EF Core side - that would also give us a better idea of exactly which API from Npgsql would be best to support the EF side.
That’s true and I think we’ll definitely end up building something like this. I just personally don’t have the impression I have the entire picture of what this needs to look like, and at the point where we need to implement the actual EF shaper things would be much clearer.
But I’m definitely open to doing it your way if you feel strongly about it.