question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Low performance of stream()

See original GitHub issue

I have test a sum with 10 millions of documents with the stream() mongoose and the stream() mongodb-native. the result is really disappointed. I post my result and my code right now.

Result:

  • mongoose.stream() ----------sum is: 49999995000000-------------- ----------Time is: 325095--------------
  • mongodb-native: ----------sum is: 49999995000000-------------- ----------Time is: 82294--------------

My code:

  • mongoose.stream()
var mongoose = require('mongoose');
var db = mongoose.createConnection('localhost', 'Yanxin');
var schema = mongoose.Schema({data: mongoose.Schema.Types.Mixed}, {_id: false});
var collection = 'fiverecord';
var model = db.model(collection, schema);
var sum = 0;
var timeBegin = new Date().getTime();
var stream = model.find({'_id.rdv': '/test/', 'data.number': {$exists: true}}, {'_id': 0}).stream();
stream.on('data', function(doc) {
    sum += doc.data.number;
}).on('err', function(err) {
    console.log('<<< err is: ' + err);
}).on('close', function() {
    console.log('----------sum is: ' + sum + '--------------');
    console.log('----------Time is: ' + (new Date().getTime() - timeBegin) + '--------------');
    db.close();
});
  • mongodb-native
var Db = require('mongodb').Db,
                assert = require('assert');
var db1 = new Db('DBname', new Server("127.0.0.1", 27017, {auto_reconnect: false, poolSize: 5}), {w:0, native_parser: false});
var timeBegin = new Date().getTime();
db1.open(function(err, db) {
    db.createCollection('CollectionNames', function(err, collection) {
        assert.equal(null, err);
        var stream = collection.find({'_id.rdv': '/test/', 'data.number': {$exists: true}}, {_id: 0, data: 1}).stream();
        var sum = 0;
        stream.on("data", function(item) {
            sum += item.data.number;
        });
        stream.on('error', function(err)
        {
            console.log(err);
        });
        stream.on("close", function() {
            console.log('----------sum is: ' + sum + '--------------');
            console.log('----------Time is: ' + (new Date().getTime() - timeBegin) + '--------------');
            db.close();
        });
    });
});

I don’t really why it cause this big difference. Can anyone explain it?

Issue Analytics

  • State:closed
  • Created 11 years ago
  • Comments:6

github_iconTop GitHub Comments

7reactions
aheckmanncommented, Dec 12, 2012

A few things, first, this is not surprising. Mongoose is an Object Document Mapper, wrapping each document returned from MongoDB in a custom object decorated with getters, setters, hooked methods, validation, etc, etc. This has a cost.

I ran my test on a collection with 4,485,326 documents.

With the default settings, on average mongoose ran 3.x+ slower than the raw driver.

The first thing to do to tweak performance is to adjust your batchSize option. With mongoose this is exposed through the query.batchSize(1000) method. On the driver you pass it as an option to collection.find(criteria, fields, { batchSize: 1000 }).

Here are my results running with batchSize set to 1000.

mongoose
----------sum is: 10059072420475--------------
----------Time is: 101807--------------
running native driver test...

native
----------sum is: 10059072420475--------------
----------Time is: 29238--------------

The next thing to do is enable lean reads with mongoose. This bypasses the document mapper part of mongoose and returns the raw documents directly from the driver. This is enabled by calling query.lean(). The final mongoose code looks like:

var stream = A.find({'data.number': {$exists: true}}, {'_id': 0, data: 1}).lean().batchSize(1000).stream();

The results:

mongoose
----------sum is: 10059072420475--------------
----------Time is: 21731--------------
running native driver test...

native
----------sum is: 10059072420475--------------
----------Time is: 25689--------------

That mongoose ran faster than the driver is a fluke, the driver cannot be faster than itself 😃 but you get the idea.

0reactions
0cvcommented, Oct 30, 2015

Thank you for the explanation Valeri, I may want to avoid reinventing the wheel myself 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Java 8: performance of Streams vs Collections - Stack Overflow
Just as I expected stream implementation is fairly slower. ... nanoTime() - t0; System.out.printf("Streams: Elapsed time:\t\t %d ns \t(%f ...
Read more >
Speed of stream. Nice but slow abstraction? Myths debunked.
The common knowledge is that streams are the compromise between performance and brevity with convenience. I couldn't find any comprehensive set of benchmarks ......
Read more >
7 Powerful Tips For Streaming on a Low-End PC - YouTube
Streaming on a low -end pc can cause a lot of stream performance issues. We are going to try to fix stream lag,...
Read more >
Benchmark: How Misusing Streams Can Make Your Code 5 ...
Implementing a solution with ANY of the new methods Java 8 offers caused around a 5x performance hit. Sometimes using a simple loop...
Read more >
Java performance tutorial – How fast are the Java 8 streams?
Streams, which come in two flavours (as sequential and parallel streams), are designed to hide the complexity of running multiple threads.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found