Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Read arbitrary trees of arrays of composite types, with specific type handlers for inner elements

See original GitHub issue

For https://github.com/npgsql/efcore.pg/issues/1691, we need a way to efficiently read a column of type record[] which can potentially represent a unique structure for every query. It more or less deals with the same issue as unmapped composite types, but in this case the plain record oid is used.

Currently, a .NET object[][] is returned for PG record[], which is not good enough in case some element has a specific type mapping for example when using EF.

Note that an item in a record can contain another record, or array of records, of an arbitrary depth.

There are many approaches to deserialize this. Here are some options:

Some reflection-based approach that takes a Type as input and constructs the given object.

public class MyType {
    public int Thing { get; set; }
    public InnerType[] MoreThings { get; set; }
}

public class InnerType {
    public NpgsqlTimestamp Timestamp { get; set; }
    public int[] IntArray { get; set; }
}

MyType[] records = reader.GetFieldValue<MyType[]>(index);

I believe this was kind of supported in Npgsql 4 (but not inner types), but was dropped.

A communication API for moving the deserializer forward, step by step. Something like this:

// in NpgsqlDataReader:
public void StartDeserializeComplexType(int columnIndex);
public int StartDeserializeArray(); // returns the number of items in the array that follows
public int StartDeserializeRecord(); // returns the number of items in the record that follows
public T DeserializeElement<T>(); // uses the standard type handler for reading the item

The last three methods throw if the deserialization state doesn’t match what the user expects.

We can then use it like this:

reader.StartDeserializeComplexType(index);
MyType[] records = new MyType[reader.StartDeserializeArray()];
for (var i = 0; i < records.Length; i++) {
    reader.StartDeserializeRecord();
    MyType record = new MyType();
    record.Thing = reader.DeserializeElement<int>();
    record.MoreThings = new InnerType[reader.StartDeserializeArray()];
    for (var j = 0; j < record.MoreThings.Length; j++) {
        reader.StartDeserializeRecord();
        InnerType inner = new InnerType();
        inner.Timestamp = reader.DeserializeElement<NpgsqlTimestamp>();
        inner.IntArray = reader.DeserializeElement<int[]>(); // can do this the easy way
        record.MoreThings[j] = inner;
    }
    records[i] = record;
}

Or let StartDeserializeComplexType return a new object having the methods above, to avoid cluttering NpgsqlDataReader, at the expense of an extra allocation (which however could be cached…).

Some visitor with callback API, so the user gets a callback when we enter and leave records and arrays etc.

While option 1 is probably easiest to use for normal users, it has performance drawbacks (due to reflection), as well as the potential unability to map record items, which are unnamed, to the correct property (since properties are unordered in C#, if not annotated to be laid out in a particular order). Also if the returned object tree is only for intermediate use, we waste memory.

Option 2 is in my opinion a better fit for how EF materialization works, as it dynamically generates .NET Expressions which will be compiled to a real function. The materialization code can just generate the (verbose) needed code that iterates through the record tree and builds whatever entity objects it wants directly, without any boxing or any creating a bunch of new .NET Types.

Also note that 1 can easily be created as a helper function using 2 if we would want that.

Comments?

Issue Analytics

State:
Created 3 years ago
Comments:10 (10 by maintainers)

Top GitHub Comments

1reaction

Emillcommented, Mar 17, 2021

Posted at https://github.com/npgsql/npgsql/pull/3604.

1reaction

rojicommented, Feb 22, 2021

But anyway, I find it a bit hard how to use the pre-mapped composite types with EF, especially since they need to be added to the pg catalog first.

Note that every PostgreSQL table has an implicit composite type which describes its rows; Npgsql doesn’t load these “table composites” by default - since there could be a huge number of tables - but you can turn that on with Load Table Composites=true on the connection string. But this indeed only covers projecting out entities, as opposed to anonymous types as in your code sample above. I definitely don’t think EF (or the user) should create composite types per-query - records do seem like the way to go.

I guess I’m proposing to defer the Npgsql ADO changes (how to read a record) until we’re further along on the EF Core side - that would also give us a better idea of exactly which API from Npgsql would be best to support the EF side.

Also note that there might be other ORMs than EF that might be interested in this, where it is easier to build this kind of support.

That’s true and I think we’ll definitely end up building something like this. I just personally don’t have the impression I have the entire picture of what this needs to look like, and at the point where we need to implement the actual EF shaper things would be much clearer.

But I’m definitely open to doing it your way if you feel strongly about it.

Top Results From Across the Web

Arrays of composite types in PostgreSQL via NodeJS

There is no such thing as array of different composite types in Postgresql. You might need to store the column as json /...

Apache Beam Programming Guide

Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data...

Introduction to Generic Trees (N-ary Trees)

The Generic trees are the N-ary trees which have the following properties: 1. Many children at every node. 2. The number of nodes...

TArray: Arrays in Unreal Engine

TArray is the most common container class within UE4. It is fast, memory efficient, and safe. TArray types are defined by two properties:...

Barnes : Chapter 3 "Overview of the Ada Language"

The most important form of composite type is the record which comprises a number of named components themselves of arbitrary and possibly different...