question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

$lookup should reference join collection instead of clone data.

See original GitHub issue

I’m not sure what’s happening in lookup.js:

each(joinColl, (obj, i) => {
    let k = hashCode(obj[foreignField])
    hash[k] = hash[k] || []
    hash[k].push(i)
  })

  each(collection, (obj) => {
    let k = hashCode(obj[localField])
    let indexes = hash[k] || []
    obj[asField] = map(indexes, (i) => clone(joinColl[i]))
    result.push(obj)
  })

I’m confused by this specifically:

obj[asField] = map(indexes, (i) => clone(joinColl[i]))

Is the $lookup data being cloned, or added by reference? Cloning, of course, is trivial for small datasets but huge (unnecessary) overhead on big datasets.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:15 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
kofrasacommented, Sep 6, 2017

Spot on @Redsandro.

Your assumptions are correct and I very much like the breakdown. It describes how things should work.

1reaction
kofrasacommented, Sep 5, 2017

To answer you question, the clone function does not maintain state.

Cloning currently happens on-demand but for each stage. I considered the approach of cloning once and then keeping a history for subsequent stages but opted out due to the complexity and significant code size introduced. Even with that in place, it won’t address the $lookup case since the collection being cloned is the secondary data source. Too many decision points open up with this approach and it is not always clear what is the right thing to do.

mingo does reference values of the input data but will return a new object with reference to unchanged parts of the original if an operation modifies, add, or remove a value from the original object. So what are hypothesizing is what happens in some cases. Cloning the whole object therefore is not necessary, but it is the simplest and safest approach to take in some cases.

For example, given the object {a: {b: {c: {d: { e: 1} } } } } if the the value for key e is changed, the entire object must be cloned. Given another object say {x: 3, a: {b: {c: {d: { e: 1} } } } }, if we change the value for x: 4 then we must create a new object with the updated value but reference the value a such that {x: 4, a: <ref-a-val>}.

I found a bug in $lookup here https://github.com/kofrasa/mingo/issues/60 which does not do what I describe above but instead modifies the original. The fix should also remove the need to clone the join collection which will address your use-case.

I am marking this issue as a bug.

Thanks for reporting and taking the time to discuss.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to join multiple collections with $lookup in mongodb
According to the documentation, $lookup can join only one external collection. What you could do is to combine ...
Read more >
MongoDB Join Two Collections Simplified - Hevo Data
We can join documents on collections in MongoDB by using the $lookup (Aggregation) function. $lookup(Aggregation) creates an outer left join ...
Read more >
22. Join two collections using aggregate method lookup to get ...
In this video we will see how to join two collections data using aggregate method lookup to get details - MongoDB If you...
Read more >
Joins and Other Aggregation Enhancements Coming in ...
A left outer equi-join produces a result set that contains data for all documents from the left table (collection) together with data from ......
Read more >
LOOKUP function - Microsoft Support
Use LOOKUP, one of the lookup and reference functions, when you need to look ... Copy the data in following table, and paste...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found