Skipping oplog entry: '<db_name>.tmp.agg_out.1' is not in the namespace configuration.
See original GitHub issueWanted to report an issue and give my fix.
I have a collection that I am trying to dump into Elasticsearch while renaming it so that the db collection name is used for the ES index name and type. However, I got the following log message (10s of thousands of times):
2017-04-10 14:56:43,809 [DEBUG] mongo_connector.oplog_manager:175 - OplogThread: Skipping oplog entry: 'production.tmp.agg_out.1' is not in the namespace configuration.
Here is my config file with the desired namespaces
handling
{
"mainAddress": "db-address.7495.mongodbdns.com:27000",
"oplogFile": "oplog.timestamp",
"noDump": true,
"batchSize": -1,
"verbosity": 3,
"continueOnError": false,
"logging": {
"type": "file",
"filename": "mongo-connector.log"
},
"authentication": {
"adminUsername": "app-admin",
"passwordFile": "mongo-connector.pwd"
},
"namespaces": {
"production.test": {
"rename" : "tests.test"
}
},
"docManagers": [
{
"docManager": "elastic2_doc_manager",
"targetURL": "127.0.0.1:9200",
"bulkSize": 5000
}
]
}
After some digging I found that producing a collection from the mongodb aggregation $out
stage makes temp collections named like db_name.tmp.agg_out.n
. I did in fact produce the test
collection from an aggregation with $out
. I figured the quickest thing to do was to drop the tmp.agg_out.n
collection, but I couldn’t find it!
Instead I did db.repairDatabase()
and ran mongo-connector again, which worked without issue.
Issue Analytics
- State:
- Created 6 years ago
- Comments:5 (3 by maintainers)
noDump causes mongo-connector to start tailing the oplog from the oldest entry. This might be changed to the newest entry in a future release.
I’m going to close this as a duplicate of https://github.com/mongodb-labs/mongo-connector/issues/586 and https://github.com/mongodb-labs/mongo-connector/issues/305.
I see. Let me outline what my actual problem is.
When I use
noDump: false
, the documents dump in a matter of seconds, I get an oplog.timestamp file, and it tails the oplog indefinitely with heartbeat messages. When I usenoDump: true
(so that I can get a timestamp file alone, I don’t care about dumping atm), it logs all these “skipping” messages of which there are 100s of millions because theimps
coll is huge.That is why I am confused here,
noDump: false
seems to be getting right to the point and dumping the target collection andnoDump: true
is running over and skipping a bunch of oplog entries. While I haven’t yet waited more than a few minutes, usingnoDump: true
is taking significantly longer when my expectation, though I am naive to the internals, is that it would take less time.Am I understanding the purpose of
noDump
?