MongoDB cluster issues: retry on disconnect does not fail over, and write errors do not fail over
See original GitHub issueIn the event that a member of the replica set becomes unable to respond during a particular find() or insert() or similar operation, there is no automatic failover. The operation fails with an error. Our preference would be to automatically retry it.
According to notes by Thomas Chraibi based on input from Charles Sarrazin of MongoDB, one approach to achieve that is to simply re-attempt the find()
or insert()
call in question. This will result in the MongoDB driver discovering a functioning replica set node to reconnect to, as in the following pseudocode:
const client = ...
const coll = client.db('cool').collection('llamas');
function attemptInsert(doc, callback) {
coll.insert(doc, (err, result) => {
if (err) {
// you might want to check for some errors, as they might be unrecoverable
// if (err.msg === 'SomeImportantErrorType') return callback(new Error('unrecoverable!'));
// now recursively call `attemptInsert` to perform server selection again
attemptInsert(doc, callback);
return;
}
callback(); // success
});
}
In addition, there is apparently some sort of issue with our autoReconnect configuration:
autoReconnect: true,
// retry forever
reconnectTries: Number.MAX_VALUE,
reconnectInterval: 1000
Apparently this will keep retrying the connection to a node that is down for as long as that node is down, which is not ideal.
However it is unclear to me why this should occur, while find() and insert() operations apparently will continue to make new connections to other nodes as needed according to the pseudocode that was provided above.
So, more clarification is needed on the following points before implementation can be completed:
- In what situation does the autoReconnect behavior come into play?
- If it is undesirable, what approach would ensure we eventually get connected again to an appropriate node?
- If new find() and insert() operations already reconnect as needed, is there any value in using autoReconnect at all? What value would that be?
- What MongoDB errors can be safely classed as “this node is ill or unavailable,” as opposed to “you did something you should not have” (examples: oversize document, illegal operation, unique key, etc)?
Issue Analytics
- State:
- Created 5 years ago
- Comments:10 (7 by maintainers)
Top GitHub Comments
We already fixed the retry on disconnect stuff, and mongo 3+ retryable writes are available via a mongodb URI, therefore this can be closed.
That helps a lot! Thank you. We’ll discuss and explore.
We are not sharding so it sounds like the mongos caveat doesn’t apply.
I take your point that it’s possible the write could make it to the oplog twice (i.e. in some scenario be carried out twice) with a retry strategy that doesn’t rely on your new retryable writes. Our client may be OK with using retryable writes and the required driver and server versions, given that the apostrophe-db-mongo-3-driver module has shipped.
On Fri, Aug 17, 2018 at 9:17 AM, Matt Broadstone notifications@github.com wrote:
–
THOMAS BOUTELL, CHIEF SOFTWARE ARCHITECT P’UNK AVENUE | (215) 755-1330 | punkave.com