question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using postgres arrays in relations for very significant performance improvements ?

See original GitHub issue

Having done a fair bit of investigation into many ORMs it seems there is a ubiquitous design flaw when it comes to the Postgres implementations. This doesn’t just apply to node ORMs but all the ORMs I’ve investigated, which is by no means an exhaustive list.

This design flaw results in a significantly more IO required to support one-to-many and many-to-many relationships than would be achievable using the “advanced” features of Postgres.

Currently all ORMS that support *-to-many relationships do so either by using a “join table” in the case of many-to-many or by referencing a foreign key on the “remote” table in the case of one-to-many.

As Postgres has the ability to store “array values”, this means it has the ability to store a “multi-valued” reference to the remote table directly from the local table thereby eliminating the need for a join table. Depending on the number of related objects, this approach has the potential to reduce the amount of logical IO required to manage relationships by a factor of 10 or even 100+.

Unfortunately, no ORMs seem to take advantage of this ability.

(Note: Needing a foreign key on the remote table in order to implement a “hasMany” relationship also violates something vaguely like Demeters Law / Principle of Least Knowledge". Why should the remote table have to “know” anything about the referencing table?)

Having done my investigation of ORMs I would really like to use bookshelf as it seems the best by a significant margin and for a number of reasons. BUT I can’t bring myself to sacrifice the performance / scalability improvements available with Postgres and I don’t want to end up having to maintain my own fork of bookshelf just for what would probably end up being a “minor” code change.

So, the reasons for this post are:

a) If you agree with my reasoning, it might be relatively easy for you to make the required changes to support this functionality in postgres (obviously it won’t work for other SQL databases).

b) there’s no point in me doing the work making changes to bookshelf if any pull request were to fall on stony ground.

If nothing else, even a “sorry, not interested” would be helpful as then I will know where I stand.

original post providing an example of my problem below.

I’m trying to set up relations that make use of postgres’ ability to store array values in a single column. So rather than using a join table or whatever, table 1 would have a column with a list of references to table 2.

CREATE TABLE properties
(
  id serial NOT NULL,
  title character varying(100),
  created_at timestamp with time zone,
  updated_at timestamp with time zone,
  media_id integer[],
  CONSTRAINT property_id PRIMARY KEY (id)
)

CREATE TABLE media
(
  id serial NOT NULL,
  filepath character varying(250),
  created_at timestamp with time zone,
  updated_at timestamp with time zone,
  CONSTRAINT media_id PRIMARY KEY (id)
)

Each row in the media_id column stores multiple references to primary keys in the media table.

I had hoped I’d be able to find a configuration option for hasMany that would allow it to retrieve related media directly using properties.media_id rather than needing a property_id field on the media table, but it’s looking like this isn’t the case?

So assuming this isn’t possible directly with any of the standard Bookshelf relation functions, I’m wondering if there’s some “sneaky” way to have some kind of “custom relation” type where I can pass in a “join function”

The query I need to execute is along the lines of

select * from media 
   where id = ANY ((select media_id from properties where id = $1)::INT[])

or

select * from media where id in ($media_id_from_property_model)

Any suggestions would be appreciated.

Issue Analytics

  • State:closed
  • Created 9 years ago
  • Reactions:5
  • Comments:15 (3 by maintainers)

github_iconTop GitHub Comments

13reactions
simgcommented, Dec 18, 2014

no benchmarks, but the performance benefits are obvious. with arrays, you avoid the need for multiple redundant database reads to the join table.

8reactions
simgcommented, May 10, 2016

I finally got around to running some benchmarks on Arrays vs Join Tables.

Created a Gist along with my benchmark code.

https://gist.github.com/simg/2f28e9dcb6207dbaa11a285021935fe2

tl;dr Array references 5 times faster than join tables for retrieving “objects”. Arrays up to twice as fast for inserting “objects” but only as the number of relationships gets quite high (eg ~100 or so).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Improve performance of PostgreSQL array queries
Given that the results from analyzing the arrays through prediction models is stored in some tables in a PostgreSQL database hints at a...
Read more >
Some Tips on Using Arrays in PostgreSQL - Fulcrum
One of these features is the ability to store data as arrays in a column. See some quick tips on using them here!...
Read more >
100x Faster Postgres Performance by Changing 1 Line
Use these step-by-step instructions to monitor slow Postgres queries to improve Postgres performance. Learn more.
Read more >
Take a Dip into PostgreSQL Arrays - Compose Articles
Using either index has its benefits and drawbacks; however, GiST indexes were primarily developed for geometric datatypes, while GIN indexes ...
Read more >
Search Acceleration over PostgreSQL Arrays, JSON and ...
The critical question is how to accurately perform index retrieval on data and improve the performance when a known property has a maximum ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found