Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Automatically parsing subqueries as composite type

See original GitHub issue

So in PostgreSQL one can do something like:

SELECT _id, body, (SELECT posts FROM posts WHERE posts._id=comments.post_id) AS post FROM comments

Which returns post as a composite type. Currently, this package returns:

{ _id: 'TWjntjispSBvbZQtv',
    body: { title: 'Comment title 0' },
    post: '(XKP3PeHoDRW2a7hFi,"{""title"": ""Post title 0""}")' }

Ideally, it should return something like:

{ _id: '5B4gBiAnhGY64W4GR',
    body: { title: 'Comment title 0' },
    post:
     { _id: 'yDwGwJ6x5Qc8HhotD', body: { title: 'Post title 0' } } }

I know I can achieve this by doing such query:

SELECT _id, body, (SELECT to_json(posts) FROM posts WHERE posts._id=comments.post_id) AS post FROM comments

But frankly, I am not sure why would I want to convert it to JSON and back just to get it over the wire in correct structure.

I have seen some other issues about composite types, but this is something which seems to be doable from my short exploration. It seems PostgreSQL exposes necessary information but it might require additional queries, which could be cached, I believe.

I tried the following script:

const util = require('util');

const {Pool} = require('pg');

const CONNECTION_CONFIG = {
  user: 'postgres',
  database: 'postgres',
  password: 'pass',
};

const pool = new Pool(CONNECTION_CONFIG);

(async () => {
  await pool.query(`
    CREATE OR REPLACE FUNCTION random_id() RETURNS TEXT LANGUAGE SQL AS $$
      SELECT array_to_string(
        array(
          SELECT SUBSTRING('23456789ABCDEFGHJKLMNPQRSTWXYZabcdefghijkmnopqrstuvwxyz' FROM floor(random()*55)::int+1 FOR 1) FROM generate_series(1, 17)
        ),
        ''
      );
    $$;
    DROP TABLE IF EXISTS comments;
    DROP TABLE IF EXISTS posts;
    CREATE TABLE posts (
      _id CHAR(17) PRIMARY KEY DEFAULT random_id(),
      body JSONB NOT NULL DEFAULT '{}'::JSONB
    );
    CREATE TABLE comments (
      _id CHAR(17) PRIMARY KEY DEFAULT random_id(),
      post_id CHAR(17) NOT NULL REFERENCES posts(_id),
      body JSONB NOT NULL DEFAULT '{}'::JSONB
    );
    DELETE FROM comments;
    DELETE FROM posts;
  `);

  let result;
  for (let i = 0; i < 5; i++) {
    result = await pool.query(`
      INSERT INTO posts (body) VALUES($1) RETURNING _id;
    `, [{'title': `Post title ${i}`}]);

    const postId = result.rows[0]._id;

    for (let j = 0; j < 10; j++) {
      await pool.query(`
        INSERT INTO comments (post_id, body) VALUES($1, $2);
      `, [postId, {'title': `Comment title ${j}`}]);
    }
  }

  result = await pool.query(`
    SELECT _id, body, (SELECT posts FROM posts WHERE posts._id=comments.post_id) AS post FROM comments
  `);

  console.log(util.inspect(result.rows, {depth: null}));

  result = await pool.query(`
    SELECT attname, atttypid FROM pg_catalog.pg_attribute LEFT JOIN pg_catalog.pg_type ON (pg_attribute.attrelid=pg_type.typrelid) WHERE pg_type.oid=$1 AND attisdropped=false AND attnum>=1 ORDER BY attnum
  `, [result.fields[2].dataTypeID]);

  console.log(util.inspect(result.rows, {depth: null}));

  await pool.end();
})();

Results:

[ { _id: 'BgCnoip8HiNt5Lant',
    body: { title: 'Comment title 0' },
    post: '(hjZPGf2ykyeYgFhCe,"{""title"": ""Post title 0""}")' },
  ...,
  { _id: 'Rh3cc3qGHN7v3Seq3',
    body: { title: 'Comment title 9' },
    post: '(zLLeSQFyt844ZX2a6,"{""title"": ""Post title 4""}")' } ]
[ { attname: '_id', atttypid: 1042 },
  { attname: 'body', atttypid: 3802 } ]

As you see, it is possible to get both names and type IDs for values inside post. With some recursion, this could be converted, no? Information how to parse tuple itself is here. And then we would have to parse each individual element.

Issue Analytics

State:
Created 5 years ago
Reactions:3
Comments:13 (6 by maintainers)

Top GitHub Comments

2reactions

mitarcommented, May 31, 2019

the tuple format is now considered a legacy, sort-of obsolete, whereas JSON+JSONB is the new recommended approach

Can you provide and support for this claim? I have not seen that anywhere.

Moreover, JSON does not support many types of values. For example, infinity and nan cannot be transported with standard JSON (and PostgreSQL has strict standard JSON).

And this is where PostgreSQL team has been focusing after v9 of the server, to make JSON as fast as possible.

That is true. JSON is really fast. Even more, parsing of JSON on node side is much faster than anything else because it is so heavily optimized in node.js. I made some benchmarks to confirm that.

On the other hand, parsing JSON is recursive. And you do not know what you are getting until you get it, so you have to scan and convert. On other hand, with tuple approach from PostgreSQL, you have all the information about the structure in advance, so at least in theory, you should be able to directly parse the object knowing what is where and so you can just map, without having to first scan what a token is and then map. I have seen this being done in some projects, but I do not find now a reference.

On the other hand, if messaging would be done in something like capnproto then parsing would be zero time instead of current conversion from any memory object to another memory object by copying.

2reactions

vitaly-tcommented, May 31, 2019

@noinkling You are looking at it in the wrong way. The main reason I abandoned my attempts was because the tuple format is now considered a legacy, sort-of obsolete, whereas JSON+JSONB is the new recommended approach. And this is where PostgreSQL team has been focusing after v9 of the server, to make JSON as fast as possible. Tuples are a history now. You should avoid using them.