Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Low performance of stream()

See original GitHub issue

I have test a sum with 10 millions of documents with the stream() mongoose and the stream() mongodb-native. the result is really disappointed. I post my result and my code right now.

Result:

mongoose.stream() ----------sum is: 49999995000000-------------- ----------Time is: 325095--------------
mongodb-native: ----------sum is: 49999995000000-------------- ----------Time is: 82294--------------

My code:

mongoose.stream()

var mongoose = require('mongoose');
var db = mongoose.createConnection('localhost', 'Yanxin');
var schema = mongoose.Schema({data: mongoose.Schema.Types.Mixed}, {_id: false});
var collection = 'fiverecord';
var model = db.model(collection, schema);
var sum = 0;
var timeBegin = new Date().getTime();
var stream = model.find({'_id.rdv': '/test/', 'data.number': {$exists: true}}, {'_id': 0}).stream();
stream.on('data', function(doc) {
    sum += doc.data.number;
}).on('err', function(err) {
    console.log('<<< err is: ' + err);
}).on('close', function() {
    console.log('----------sum is: ' + sum + '--------------');
    console.log('----------Time is: ' + (new Date().getTime() - timeBegin) + '--------------');
    db.close();
});

mongodb-native

var Db = require('mongodb').Db,
                assert = require('assert');
var db1 = new Db('DBname', new Server("127.0.0.1", 27017, {auto_reconnect: false, poolSize: 5}), {w:0, native_parser: false});
var timeBegin = new Date().getTime();
db1.open(function(err, db) {
    db.createCollection('CollectionNames', function(err, collection) {
        assert.equal(null, err);
        var stream = collection.find({'_id.rdv': '/test/', 'data.number': {$exists: true}}, {_id: 0, data: 1}).stream();
        var sum = 0;
        stream.on("data", function(item) {
            sum += item.data.number;
        });
        stream.on('error', function(err)
        {
            console.log(err);
        });
        stream.on("close", function() {
            console.log('----------sum is: ' + sum + '--------------');
            console.log('----------Time is: ' + (new Date().getTime() - timeBegin) + '--------------');
            db.close();
        });
    });
});

I don’t really why it cause this big difference. Can anyone explain it?

Issue Analytics

State:
Created 11 years ago
Comments:6

Top GitHub Comments

7reactions

aheckmanncommented, Dec 12, 2012

A few things, first, this is not surprising. Mongoose is an Object Document Mapper, wrapping each document returned from MongoDB in a custom object decorated with getters, setters, hooked methods, validation, etc, etc. This has a cost.

I ran my test on a collection with 4,485,326 documents.

With the default settings, on average mongoose ran 3.x+ slower than the raw driver.

The first thing to do to tweak performance is to adjust your batchSize option. With mongoose this is exposed through the query.batchSize(1000) method. On the driver you pass it as an option to collection.find(criteria, fields, { batchSize: 1000 }).

Here are my results running with batchSize set to 1000.

mongoose
----------sum is: 10059072420475--------------
----------Time is: 101807--------------
running native driver test...

native
----------sum is: 10059072420475--------------
----------Time is: 29238--------------

The next thing to do is enable lean reads with mongoose. This bypasses the document mapper part of mongoose and returns the raw documents directly from the driver. This is enabled by calling query.lean(). The final mongoose code looks like:

var stream = A.find({'data.number': {$exists: true}}, {'_id': 0, data: 1}).lean().batchSize(1000).stream();

The results:

mongoose
----------sum is: 10059072420475--------------
----------Time is: 21731--------------
running native driver test...

native
----------sum is: 10059072420475--------------
----------Time is: 25689--------------

That mongoose ran faster than the driver is a fluke, the driver cannot be faster than itself 😃 but you get the idea.

0reactions

0cvcommented, Oct 30, 2015

Thank you for the explanation Valeri, I may want to avoid reinventing the wheel myself 😃

Top Results From Across the Web

Java 8: performance of Streams vs Collections - Stack Overflow

Just as I expected stream implementation is fairly slower. ... nanoTime() - t0; System.out.printf("Streams: Elapsed time:\t\t %d ns \t(%f ...

Speed of stream. Nice but slow abstraction? Myths debunked.

The common knowledge is that streams are the compromise between performance and brevity with convenience. I couldn't find any comprehensive set of benchmarks ......

7 Powerful Tips For Streaming on a Low-End PC - YouTube

Streaming on a low -end pc can cause a lot of stream performance issues. We are going to try to fix stream lag,...

Benchmark: How Misusing Streams Can Make Your Code 5 ...

Implementing a solution with ANY of the new methods Java 8 offers caused around a 5x performance hit. Sometimes using a simple loop...

Java performance tutorial – How fast are the Java 8 streams?

Streams, which come in two flavours (as sequential and parallel streams), are designed to hide the complexity of running multiple threads.