Proposal: implement KeystoneJS's relationship feature directly into mongoose
See original GitHub issueI’ve developed a useful feature in KeystoneJS that lets you populate a relationship from either side, while only storing the data on one side, and am looking for feedback on whether it is something that could / should be brought back into mongoose itself. (It might be possible to add as a separate package but I suspect there’d be too much rewriting of mongoose internals for that to be a good idea).
Hard to explain without examples so I’ve used Posts and Categories as a basic, contrived one to demonstrate what I’m talking about here; in reality you’d rarely load all the posts for a category but there are other real world cases where it’s less unreasonable you’d want to do this, and Posts w/ Categories is an easy way to demo it.
_Note: I originally wrote this as a gist, which is easier to read the examples in, and can be found here: https://gist.github.com/JedWatson/8519978_
The problem
The built-in population feature is really useful; not just for simple examples (like below) but if you’re passing around a query (promise) in a sophisticated app, you can encapsulate both the query filters and the full dataset it should return, while allowing it to be modified by another method before it is run.
There are many ways to implement relationships between collections in mongo, one of the most performant (from a read perspective) being to store the relationship data on both models. Mongoose’s population support also means this is one of the easiest to code against in many scenarios.
It requires a lot of management though; keeping arrays in sync means using pre/post save middleware, or piping any changes to the arrays through specific methods that keep them in sync for you.
Here’s a basic implementation of this:
// This is a contrived example to demonstrate duplicating
// data on both sides of a relationship to enable population
var PostSchema = new mongoose.Schema({
title: String,
slug: String,
contents: String,
categories: [{ type: mongoose.Schema.Types.ObjectId, ref: 'Category' }]
});
var Post = mongoose.model('Post', PostSchema);
var CategorySchema = new mongoose.Schema({
name: String,
posts: [{ type: mongoose.Schema.Types.ObjectId, ref: 'Post' }]
});
var Category = mongoose.model('Category', CategorySchema);
// now we can populate on both sides
Post.find().populate('categories').exec(function(err, posts) { /* ... */ });
Category.find().populate('posts').exec(function(err, categories) { /* ... */ });
I think it’s better to store relationships on one side only in many cases - either a single reference or an array of references on the primary Model. There’s nothing to keep in sync, one ‘source of truth’ but it requires more code to query (from the secondary Model).
Here’s a (very rough) example implementation of this:
// This is a contrived example to demonstrate
// storing data on one side of a relationship
var PostSchema = new mongoose.Schema({
title: String,
slug: String,
contents: String,
categories: [{ type: mongoose.Schema.Types.ObjectId, ref: 'Category' }]
});
var Post = mongoose.model('Post', PostSchema);
var CategorySchema = new mongoose.Schema({
name: String
});
var Category = mongoose.model('Category', CategorySchema);
// easy to get the categories for a post...
Post.find().populate('categories').exec(function(err, posts) {
// you have posts with categories
});
// but harder to get the posts for a category.
Category.find().populate('posts').exec(function(err, categories) {
// handle err
async.forEach(categories, function(category, done) {
Post.find().where('categories').in([category.id]).exec(function(err, posts) {
category.posts = posts;
done(err);
});
}, function(err) {
// ... you have categories with posts
});
});
The solution
I have developed a Relationship feature in Keystone that lets you populate one-sided relationships as if they were two-sided, by specifying a relationship
on the secondary schema, and propose we implement it (better) in mongoose itself.
In Keystone there is a (currently undocumented) populateRelated
method that is created on Lists
. Here’s an example of how this works:
// This is how you could use Keystone's features to simplify
// managing the relationship. Posts have gained authors.
var Post = new keystone.List({
autokey: { path: 'slug', from: 'title', unique: 'true' }
}).add({
title: String,
contents: keystone.Types.Markdown,
author: { type: keystone.Types.Relationship, ref: 'User' },
categories: { type: keystone.Types.Relationship, ref: 'Category', many: true }
});
Post.register();
var Category = new keystone.List().add({
name: String
});
Category.relationship({ path: 'posts', ref: 'Post', refPath: 'categories' });
Category.register();
// we can populate categories on posts using mongoose's populate
Post.model.find().populate('categories').exec(function(err, posts) { /* ... */ });
// there's one more step, but we can use keystone's populateRelated to achieve a similar effect for posts in categories
Category.model.find().exec(function(err, categories) {
keystone.populateRelated(categories, 'posts', function(err) {
// ... you have categories with posts
});
});
// if you've got a single document you want to populate a relationship on, it's neater
Category.model.findOne().exec(function(err, category) {
category.populateRelated('posts', function(err) {
// posts is populated
});
});
// if you also wanted to populate the author on the posts loaded for each category,
// you can do that too - it uses mongoose's populate method because the relationship
// is stored in a path on the Post
Category.model.findOne().exec(function(err, category) {
category.populateRelated('posts[author]', function(err) {
// posts is populated, and each author on each post is populated
});
});
The biggest downside to implementing it outside of mongoose’s populate functionality is it can’t be queued before the query is executed, so it has to be used similarly to the populate
method that is available to Document
objects. There’s also a (currently horribly inefficient) method on keystone
itself to run this for all documents in an array.
If we brought this across, you could call a method on the mongoose Schema
telling it that another Schema holds a ref
to it in a path
(simple or array for one-to-many or many-to-many, could be detected from the related Schema). This relationship would then be taken into account by the populate
method, and (although the underlying queries would be different) it is treated like a path
storing { type: mongoose.Schema.Types.ObjectId, ref: '(primary Model)' }
.
Here’s how I propose this would work if implemented natively in mongoose:
// This is the same as the Keystone relationships example, but written as if
// the functionality were built into mongoose itself. Much simpler to use.
var PostSchema = new mongoose.Schema({
title: String,
slug: String,
contents: String,
categories: [{ type: mongoose.Schema.Types.ObjectId, ref: 'Category' }]
});
var Post = mongoose.model('Post', PostSchema);
var CategorySchema = new mongoose.Schema({
name: String
});
Category.relationship({ path: 'posts', ref: 'Post', refPath: 'categories' });
var Category = mongoose.model('Category', CategorySchema);
// with proper integration with populate, no need for another nested function!
Category.find().populate('posts').exec(function(err, categories) {
// ... you have categories with posts
});
Or alternatively:
// An alternative way of configuring relationships on Schemas in mongoose,
// if it were implemented as a SchemaType instead of a separate method.
// Not sure if this would be better or worse to implement in mongoose!
var PostSchema = new mongoose.Schema({
title: String,
slug: String,
contents: String,
categories: [{ type: mongoose.Schema.Types.ObjectId, ref: 'Category' }]
});
var Post = mongoose.model('Post', PostSchema);
var CategorySchema = new mongoose.Schema({
name: String,
posts: { type: mongoose.Schema.Types.Relationship, ref: 'Post', refPath: 'categories' }
});
var Category = mongoose.model('Category', CategorySchema);
// with proper integration with populate, no need for another nested function!
Category.find().populate('posts').exec(function(err, categories) {
// ... you have categories with posts
});
If this is something that would be welcomed in mongoose I’d be happy to help implement it (but I’ve looked through the code and might need a primer by somebody who understands how populate
works better than I do). The method in Keystone is currently fairly rough; it works but the performance (and implementation) leaves a bit to be desired. If this stays a Keystone specific feature, I’d love someone with more experience wrangling performance in mongoose to help me improve it.
The actual implementation in Keystone can be found here:
Issue Analytics
- State:
- Created 10 years ago
- Reactions:9
- Comments:18
Top GitHub Comments
@bonesoul see populate virtuals
any updates on this?