Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Proposal: implement KeystoneJS's relationship feature directly into mongoose

See original GitHub issue

I’ve developed a useful feature in KeystoneJS that lets you populate a relationship from either side, while only storing the data on one side, and am looking for feedback on whether it is something that could / should be brought back into mongoose itself. (It might be possible to add as a separate package but I suspect there’d be too much rewriting of mongoose internals for that to be a good idea).

Hard to explain without examples so I’ve used Posts and Categories as a basic, contrived one to demonstrate what I’m talking about here; in reality you’d rarely load all the posts for a category but there are other real world cases where it’s less unreasonable you’d want to do this, and Posts w/ Categories is an easy way to demo it.

_Note: I originally wrote this as a gist, which is easier to read the examples in, and can be found here: https://gist.github.com/JedWatson/8519978_

The problem

The built-in population feature is really useful; not just for simple examples (like below) but if you’re passing around a query (promise) in a sophisticated app, you can encapsulate both the query filters and the full dataset it should return, while allowing it to be modified by another method before it is run.

There are many ways to implement relationships between collections in mongo, one of the most performant (from a read perspective) being to store the relationship data on both models. Mongoose’s population support also means this is one of the easiest to code against in many scenarios.

It requires a lot of management though; keeping arrays in sync means using pre/post save middleware, or piping any changes to the arrays through specific methods that keep them in sync for you.

Here’s a basic implementation of this:

// This is a contrived example to demonstrate duplicating
// data on both sides of a relationship to enable population

var PostSchema = new mongoose.Schema({
  title: String,
  slug: String,
  contents: String,
  categories: [{ type: mongoose.Schema.Types.ObjectId, ref: 'Category' }]
});

var Post = mongoose.model('Post', PostSchema);

var CategorySchema = new mongoose.Schema({
  name: String,
  posts: [{ type: mongoose.Schema.Types.ObjectId, ref: 'Post' }]
});

var Category = mongoose.model('Category', CategorySchema);

// now we can populate on both sides

Post.find().populate('categories').exec(function(err, posts) { /* ... */ });
Category.find().populate('posts').exec(function(err, categories) { /* ... */ });

I think it’s better to store relationships on one side only in many cases - either a single reference or an array of references on the primary Model. There’s nothing to keep in sync, one ‘source of truth’ but it requires more code to query (from the secondary Model).

Here’s a (very rough) example implementation of this:

// This is a contrived example to demonstrate
// storing data on one side of a relationship

var PostSchema = new mongoose.Schema({
  title: String,
  slug: String,
  contents: String,
  categories: [{ type: mongoose.Schema.Types.ObjectId, ref: 'Category' }]
});

var Post = mongoose.model('Post', PostSchema);

var CategorySchema = new mongoose.Schema({
  name: String
});

var Category = mongoose.model('Category', CategorySchema);

// easy to get the categories for a post...

Post.find().populate('categories').exec(function(err, posts) {
  // you have posts with categories
});

// but harder to get the posts for a category.

Category.find().populate('posts').exec(function(err, categories) {
  // handle err
  async.forEach(categories, function(category, done) {
    Post.find().where('categories').in([category.id]).exec(function(err, posts) {
      category.posts = posts;
      done(err);
    });
  }, function(err) {
    // ... you have categories with posts
  });
});

The solution

I have developed a Relationship feature in Keystone that lets you populate one-sided relationships as if they were two-sided, by specifying a relationship on the secondary schema, and propose we implement it (better) in mongoose itself.

In Keystone there is a (currently undocumented) populateRelated method that is created on Lists. Here’s an example of how this works:

// This is how you could use Keystone's features to simplify
// managing the relationship. Posts have gained authors.

var Post = new keystone.List({
  autokey: { path: 'slug', from: 'title', unique: 'true' }
}).add({
  title: String,
  contents: keystone.Types.Markdown,
  author: { type: keystone.Types.Relationship, ref: 'User' },
  categories: { type: keystone.Types.Relationship, ref: 'Category', many: true }
});
Post.register();

var Category = new keystone.List().add({
  name: String
});
Category.relationship({ path: 'posts', ref: 'Post', refPath: 'categories' });
Category.register();

// we can populate categories on posts using mongoose's populate
Post.model.find().populate('categories').exec(function(err, posts) { /* ... */ });

// there's one more step, but we can use keystone's populateRelated to achieve a similar effect for posts in categories
Category.model.find().exec(function(err, categories) {
  keystone.populateRelated(categories, 'posts', function(err) {
    // ... you have categories with posts
  });
});

// if you've got a single document you want to populate a relationship on, it's neater
Category.model.findOne().exec(function(err, category) {
  category.populateRelated('posts', function(err) {
    // posts is populated
  });
});

// if you also wanted to populate the author on the posts loaded for each category,
// you can do that too - it uses mongoose's populate method because the relationship
// is stored in a path on the Post
Category.model.findOne().exec(function(err, category) {
  category.populateRelated('posts[author]', function(err) {
    // posts is populated, and each author on each post is populated
  });
});

The biggest downside to implementing it outside of mongoose’s populate functionality is it can’t be queued before the query is executed, so it has to be used similarly to the populate method that is available to Document objects. There’s also a (currently horribly inefficient) method on keystone itself to run this for all documents in an array.

If we brought this across, you could call a method on the mongoose Schema telling it that another Schema holds a ref to it in a path (simple or array for one-to-many or many-to-many, could be detected from the related Schema). This relationship would then be taken into account by the populate method, and (although the underlying queries would be different) it is treated like a path storing { type: mongoose.Schema.Types.ObjectId, ref: '(primary Model)' }.

Here’s how I propose this would work if implemented natively in mongoose:

// This is the same as the Keystone relationships example, but written as if
// the functionality were built into mongoose itself. Much simpler to use.

var PostSchema = new mongoose.Schema({
  title: String,
  slug: String,
  contents: String,
  categories: [{ type: mongoose.Schema.Types.ObjectId, ref: 'Category' }]
});

var Post = mongoose.model('Post', PostSchema);

var CategorySchema = new mongoose.Schema({
  name: String
});

Category.relationship({ path: 'posts', ref: 'Post', refPath: 'categories' });

var Category = mongoose.model('Category', CategorySchema);

// with proper integration with populate, no need for another nested function!
Category.find().populate('posts').exec(function(err, categories) {
  // ... you have categories with posts
});

Or alternatively:

// An alternative way of configuring relationships on Schemas in mongoose,
// if it were implemented as a SchemaType instead of a separate method.

// Not sure if this would be better or worse to implement in mongoose!

var PostSchema = new mongoose.Schema({
  title: String,
  slug: String,
  contents: String,
  categories: [{ type: mongoose.Schema.Types.ObjectId, ref: 'Category' }]
});

var Post = mongoose.model('Post', PostSchema);

var CategorySchema = new mongoose.Schema({
  name: String,
  posts: { type: mongoose.Schema.Types.Relationship, ref: 'Post', refPath: 'categories' }
});

var Category = mongoose.model('Category', CategorySchema);

// with proper integration with populate, no need for another nested function!
Category.find().populate('posts').exec(function(err, categories) {
  // ... you have categories with posts
});

If this is something that would be welcomed in mongoose I’d be happy to help implement it (but I’ve looked through the code and might need a primer by somebody who understands how populate works better than I do). The method in Keystone is currently fairly rough; it works but the performance (and implementation) leaves a bit to be desired. If this stays a Keystone specific feature, I’d love someone with more experience wrangling performance in mongoose to help me improve it.

The actual implementation in Keystone can be found here: