Observed flakyness when using the "fast_executemany" option with the "DataDirect PostgreSQL ODBC Driver"
See original GitHub issueHi there,
we ran into a specific issue when using CrateDB with ODBC, might have identified a flaw and wanted to share the outcome of our investigations with you.
Problem report
When connecting to CrateDB’s PostgreSQL interface using unixODBC through its pyodbc binding and enabling the fast_executemany
option, the Progress DataDirect PostgreSQL ODBC Driver shows flaky communication and synchronization behavior.
Everything works well when using PostgreSQL. Also, when using the vanilla psqlODBC - PostgreSQL ODBC driver, no flaws happened.
Details
The software versions we are using are:
- CrateDB 4.5.1
- PostgreSQL 13.2
- Docker 20.10.5
- Python 3.9.5
- pyodbc 4.0.30
- Progress DataDirect Connect ODBC PostgreSQL Wire Protocol Driver 7.1.6
- psqlODBC - ODBC driver for PostgreSQL 11.00.0000
Reproduction
The whole setup for investigating this issue is wrapped into a repository [1] in order to make reproduction effortless. Its README document also outlines the observations in more detail. The setup must be run on Linux or a respective emulated environment because it only includes ODBC drivers for Linux.
A visual representation of the flaky behavior in the specific scenario is attached in form of a screenshot capturing the outcome of the test suite.
Thoughts
Maybe this is related to what we investigated on behalf of https://github.com/brianc/node-postgres/issues/2454 and https://github.com/brianc/node-postgres/issues/2455 and mitigated with https://github.com/crate/crate/pull/10979. However, that is really just a wild guess.
With kind regards, Andreas.
[1] https://github.com/amotl/cratedb-datadirect-odbc
/cc @hammerhead, @proddata, @jayeff
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (5 by maintainers)
Top GitHub Comments
We have investigated this issue and it seems that the Progress DataDirect ODBC driver expects that a response to a
Bind/Execute/Close/Sync
(logically: binding values to a prepared statement) will fit into one data frame, so a flush should only happen after theSync->ReadyForQuery
outbound message. But CrateDB will flush after everyExecute->CommandComplete
message to ensure correct outbound ordering of the messages as internally, CrateDB run operations asynchronously.To support this drivers expectation, we’d have to ensure that a flush would only happen after a
Sync->ReadyForQuery
message, but every approach to ensure that would lead to significant overhead (e.g. thread context switches) which all other clients would suffer from. On the other hand, we don’t think that this behaviour/expectation is correct as a flush can always happen implicit when some buffer size reaches certain thresholds at multiple layers. As far as we know, there is no strict rule about when a flush is allowed and when not.We have contacted the vendor of this proprietary driver now to notify them about a possible wrong behaviour of their driver.
Closing this here. As far as we can tell this is a problem on the driver side. Solving it on the CrateDB side would require a workaround that would introduce a performance penality affecting other drivers as well.