batch insert - too slow
See original GitHub issueI was testing insert query like below (for jdbc engine table).
insert into jdbc_test(a, b)
select number*3, number*5 from system.numbers n limit 5000
I was using mysql and mssql drivers. In both cases results were very slow - 200-300 rows per second (using tiny table with 2 int columns). This is about the same speed as if you’d insert rows one by one with auto commit. rewriteBatchedStatements=true was set. I did few more tests using plain java for mysql, if I turn off auto commit I can get 7-8K rows per second, even if inserting one by one. Using batch api this goes up to 80K per second.
So sounds like when bridge is writing data it’s using very small batches (or not using batches at all?) Regarding auto commit - it is true by default , however during batch writing I think it’s more logical to start transaction tough to avoid partial inserts (on crash). It would also speed up inserts with smaller batches.
Here is the quick and dirty snippet for testing. It assumes there is data base “test” and table “test” with (a int, b int) columns.
import java.sql.DriverManager;
import java.sql.SQLException;
/* usage
// copy 2000 rows 1 by 1, with auto commit
java --source 11 -cp "./mysql-connector-java-8.0.26.jar" jt.java 2000
// copy 50K rows in 1000 row batches with auto commit (after each batch insertion)
java --source 11 -cp "./mysql-connector-java-8.0.26.jar" jt.java 50000 1000
// copy 50K rows in 1000 row batches, single transaction
java --source 11 -cp "./mysql-connector-java-8.0.26.jar" jt.java 50000 1000 false
*/
class Jt {
public static void main(String[] args) {
var rowsToCopy = args.length > 0 ? Integer.parseInt(args[0]) : 1000;
var batchSize = args.length > 1 ? Integer.parseInt(args[1]) : 1;
boolean autoCommit = args.length > 2 ? Boolean.parseBoolean(args[2]) : true;
try {
long start = System.currentTimeMillis();
var conn = DriverManager.getConnection(
"jdbc:mysql://localhost/test?user=root&password=root&rewriteBatchedStatements=true");
conn.setAutoCommit(autoCommit);
var stmt = conn.prepareStatement("insert into test.test values(?,?)");
if (batchSize == 1) {
System.out.println("No batching");
for (int i = 1; i <= rowsToCopy; i++) {
stmt.setInt(1, i * 3);
stmt.setInt(2, i * 5);
stmt.executeUpdate();
}
} else {
System.out.println("batch size: " + batchSize);
for (int i = 1; i <= rowsToCopy; i++) {
stmt.setInt(1, i * 3);
stmt.setInt(2, i * 5);
stmt.addBatch();
if (i % batchSize == 0)
stmt.executeBatch();
}
stmt.executeBatch();
}
if (!autoCommit)
conn.commit();
long finish = System.currentTimeMillis();
long dur = (finish - start);
System.out.println(String.format("copied %d rows in %,d ms (%d rows per sec). Autocommit: %b", rowsToCopy,
dur, rowsToCopy * 1000 / dur, autoCommit));
var s = conn.createStatement();
var rs = s.executeQuery("select sum(1) from test.test");
rs.next();
System.out.println(String.format("Current row count: %d", rs.getInt(1)));
conn.close();
} catch (SQLException ex) {
// handle any errors
System.out.println("SQLException: " + ex.getMessage());
System.out.println("SQLState: " + ex.getSQLState());
System.out.println("VendorError: " + ex.getErrorCode());
}
}
}
Issue Analytics
- State:
- Created 2 years ago
- Comments:10
Top GitHub Comments
driver: https://repo1.maven.org/maven2/com/oracle/database/jdbc/ojdbc8/21.3.0.0/ojdbc8-21.3.0.0.jar
docker: docker run -d -p 1521:1521 -e ORACLE_PASSWORD=123456 gvenzl/oracle-xe
jdbc url: jdbc:oracle:thin:@//localhost:1521/XEPDB1 user: SYSTEM pass: 123456
test ddl
Please be aware that increasing
batch_size
comes at a cost. The largerbatch_size
the higher chance JDBC bridge will run of out memory, because it has to hold the whole batch in memory before sending over to target database. I’d suggest to set a reasonable number by considering row size(column count and size of each column etc.), concurrency, SLA, and JDBC bridge memory configuration etc. together. On a side note,fetch_size
has similar issue but it’s just for query.As to either use validation query like “select 1” or standard JDBC API
Connection.isValid()
, it has nothing to do with JDBC bridge but HikariCP, the connection pool implementation. However, I do see the headache of tuning configuration for different databases - we should have templates defined in advance so that datasource configuration is just about host, port, database, and credentials. I didn’t mention timeout here but I hope we can have better to configure that as well.Lastly, to recap issues we discussed in this thread: