Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

batch insert - too slow

See original GitHub issue

I was testing insert query like below (for jdbc engine table).

insert into jdbc_test(a, b)
select number*3, number*5 from system.numbers n  limit 5000

I was using mysql and mssql drivers. In both cases results were very slow - 200-300 rows per second (using tiny table with 2 int columns). This is about the same speed as if you’d insert rows one by one with auto commit. rewriteBatchedStatements=true was set. I did few more tests using plain java for mysql, if I turn off auto commit I can get 7-8K rows per second, even if inserting one by one. Using batch api this goes up to 80K per second.

So sounds like when bridge is writing data it’s using very small batches (or not using batches at all?) Regarding auto commit - it is true by default , however during batch writing I think it’s more logical to start transaction tough to avoid partial inserts (on crash). It would also speed up inserts with smaller batches.

Here is the quick and dirty snippet for testing. It assumes there is data base “test” and table “test” with (a int, b int) columns.


import java.sql.DriverManager;
import java.sql.SQLException;

/* usage

// copy 2000 rows 1 by 1, with auto commit
java --source 11 -cp "./mysql-connector-java-8.0.26.jar" jt.java 2000

// copy 50K rows in 1000 row batches with auto commit (after each batch insertion)
java --source 11 -cp "./mysql-connector-java-8.0.26.jar" jt.java 50000 1000

// copy 50K rows in 1000 row batches, single transaction
java --source 11 -cp "./mysql-connector-java-8.0.26.jar" jt.java 50000 1000 false

*/

class Jt {
    public static void main(String[] args) {

        var rowsToCopy = args.length > 0 ? Integer.parseInt(args[0]) : 1000;
        var batchSize = args.length > 1 ? Integer.parseInt(args[1]) : 1;
        boolean autoCommit = args.length > 2 ? Boolean.parseBoolean(args[2]) : true;

        try {

            long start = System.currentTimeMillis();

            var conn = DriverManager.getConnection(
                    "jdbc:mysql://localhost/test?user=root&password=root&rewriteBatchedStatements=true");

            conn.setAutoCommit(autoCommit);

            var stmt = conn.prepareStatement("insert into test.test values(?,?)");

            if (batchSize == 1) {
                System.out.println("No batching");
                for (int i = 1; i <= rowsToCopy; i++) {
                    stmt.setInt(1, i * 3);
                    stmt.setInt(2, i * 5);
                    stmt.executeUpdate();
                }
            } else {
                System.out.println("batch size: " + batchSize);
                for (int i = 1; i <= rowsToCopy; i++) {
                    stmt.setInt(1, i * 3);
                    stmt.setInt(2, i * 5);
                    stmt.addBatch();
                    if (i % batchSize == 0)
                        stmt.executeBatch();
                }
                stmt.executeBatch();
            }
            if (!autoCommit)
                conn.commit();

            long finish = System.currentTimeMillis();
            long dur = (finish - start);
            System.out.println(String.format("copied %d rows in %,d ms (%d rows per sec). Autocommit: %b", rowsToCopy,
                    dur, rowsToCopy * 1000 / dur, autoCommit));

            var s = conn.createStatement();
            var rs = s.executeQuery("select sum(1) from test.test");
            rs.next();
            System.out.println(String.format("Current row count: %d", rs.getInt(1)));
            conn.close();

        } catch (SQLException ex) {
            // handle any errors
            System.out.println("SQLException: " + ex.getMessage());
            System.out.println("SQLState: " + ex.getSQLState());
            System.out.println("VendorError: " + ex.getErrorCode());
        }

    }
}

Issue Analytics

State:
Created 2 years ago
Comments:10

Top GitHub Comments

1reaction

mikeTWC1984commented, Oct 6, 2021

driver: https://repo1.maven.org/maven2/com/oracle/database/jdbc/ojdbc8/21.3.0.0/ojdbc8-21.3.0.0.jar

docker: docker run -d -p 1521:1521 -e ORACLE_PASSWORD=123456 gvenzl/oracle-xe

jdbc url: jdbc:oracle:thin:@//localhost:1521/XEPDB1 user: SYSTEM pass: 123456

test ddl


CREATE TABLE system.test (a int, b int)
INSERT INTO system.test SELECT 1, 2 FROM dual

CREATE TABLE system."test" (a int, b int)
INSERT INTO system."test" SELECT 3, 4 FROM dual

SELECT * FROM system.test
SELECT * FROM system."test"

0reactions

zhicwucommented, Oct 8, 2021

Please be aware that increasing batch_size comes at a cost. The larger batch_size the higher chance JDBC bridge will run of out memory, because it has to hold the whole batch in memory before sending over to target database. I’d suggest to set a reasonable number by considering row size(column count and size of each column etc.), concurrency, SLA, and JDBC bridge memory configuration etc. together. On a side note, fetch_size has similar issue but it’s just for query.

As to either use validation query like “select 1” or standard JDBC API Connection.isValid(), it has nothing to do with JDBC bridge but HikariCP, the connection pool implementation. However, I do see the headache of tuning configuration for different databases - we should have templates defined in advance so that datasource configuration is just about host, port, database, and credentials. I didn’t mention timeout here but I hope we can have better to configure that as well.

Lastly, to recap issues we discussed in this thread:

transaction support is missing - begin -> insert batch by batch -> commit
Java 9+ support - I’ll create multi-release jar file in the near future, and then build native image accordingly
quotation in oracle - I’ll need to investigate the issue in weekend