Low performance of stream()
See original GitHub issueI have test a sum with 10 millions of documents with the stream() mongoose and the stream() mongodb-native. the result is really disappointed. I post my result and my code right now.
Result:
- mongoose.stream() ----------sum is: 49999995000000-------------- ----------Time is: 325095--------------
- mongodb-native: ----------sum is: 49999995000000-------------- ----------Time is: 82294--------------
My code:
- mongoose.stream()
var mongoose = require('mongoose');
var db = mongoose.createConnection('localhost', 'Yanxin');
var schema = mongoose.Schema({data: mongoose.Schema.Types.Mixed}, {_id: false});
var collection = 'fiverecord';
var model = db.model(collection, schema);
var sum = 0;
var timeBegin = new Date().getTime();
var stream = model.find({'_id.rdv': '/test/', 'data.number': {$exists: true}}, {'_id': 0}).stream();
stream.on('data', function(doc) {
sum += doc.data.number;
}).on('err', function(err) {
console.log('<<< err is: ' + err);
}).on('close', function() {
console.log('----------sum is: ' + sum + '--------------');
console.log('----------Time is: ' + (new Date().getTime() - timeBegin) + '--------------');
db.close();
});
- mongodb-native
var Db = require('mongodb').Db,
assert = require('assert');
var db1 = new Db('DBname', new Server("127.0.0.1", 27017, {auto_reconnect: false, poolSize: 5}), {w:0, native_parser: false});
var timeBegin = new Date().getTime();
db1.open(function(err, db) {
db.createCollection('CollectionNames', function(err, collection) {
assert.equal(null, err);
var stream = collection.find({'_id.rdv': '/test/', 'data.number': {$exists: true}}, {_id: 0, data: 1}).stream();
var sum = 0;
stream.on("data", function(item) {
sum += item.data.number;
});
stream.on('error', function(err)
{
console.log(err);
});
stream.on("close", function() {
console.log('----------sum is: ' + sum + '--------------');
console.log('----------Time is: ' + (new Date().getTime() - timeBegin) + '--------------');
db.close();
});
});
});
I don’t really why it cause this big difference. Can anyone explain it?
Issue Analytics
- State:
- Created 11 years ago
- Comments:6
Top Results From Across the Web
Java 8: performance of Streams vs Collections - Stack Overflow
Just as I expected stream implementation is fairly slower. ... nanoTime() - t0; System.out.printf("Streams: Elapsed time:\t\t %d ns \t(%f ...
Read more >Speed of stream. Nice but slow abstraction? Myths debunked.
The common knowledge is that streams are the compromise between performance and brevity with convenience. I couldn't find any comprehensive set of benchmarks ......
Read more >7 Powerful Tips For Streaming on a Low-End PC - YouTube
Streaming on a low -end pc can cause a lot of stream performance issues. We are going to try to fix stream lag,...
Read more >Benchmark: How Misusing Streams Can Make Your Code 5 ...
Implementing a solution with ANY of the new methods Java 8 offers caused around a 5x performance hit. Sometimes using a simple loop...
Read more >Java performance tutorial – How fast are the Java 8 streams?
Streams, which come in two flavours (as sequential and parallel streams), are designed to hide the complexity of running multiple threads.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
A few things, first, this is not surprising. Mongoose is an Object Document Mapper, wrapping each document returned from MongoDB in a custom object decorated with getters, setters, hooked methods, validation, etc, etc. This has a cost.
I ran my test on a collection with 4,485,326 documents.
With the default settings, on average mongoose ran 3.x+ slower than the raw driver.
The first thing to do to tweak performance is to adjust your batchSize option. With mongoose this is exposed through the
query.batchSize(1000)
method. On the driver you pass it as an option tocollection.find(criteria, fields, { batchSize: 1000 })
.Here are my results running with batchSize set to 1000.
The next thing to do is enable
lean
reads with mongoose. This bypasses the document mapper part of mongoose and returns the raw documents directly from the driver. This is enabled by callingquery.lean()
. The final mongoose code looks like:The results:
That mongoose ran faster than the driver is a fluke, the driver cannot be faster than itself 😃 but you get the idea.
Thank you for the explanation Valeri, I may want to avoid reinventing the wheel myself 😃