question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

batch insert - too slow

See original GitHub issue

I was testing insert query like below (for jdbc engine table).

insert into jdbc_test(a, b)
select number*3, number*5 from system.numbers n  limit 5000

I was using mysql and mssql drivers. In both cases results were very slow - 200-300 rows per second (using tiny table with 2 int columns). This is about the same speed as if you’d insert rows one by one with auto commit. rewriteBatchedStatements=true was set. I did few more tests using plain java for mysql, if I turn off auto commit I can get 7-8K rows per second, even if inserting one by one. Using batch api this goes up to 80K per second.

So sounds like when bridge is writing data it’s using very small batches (or not using batches at all?) Regarding auto commit - it is true by default , however during batch writing I think it’s more logical to start transaction tough to avoid partial inserts (on crash). It would also speed up inserts with smaller batches.

Here is the quick and dirty snippet for testing. It assumes there is data base “test” and table “test” with (a int, b int) columns.


import java.sql.DriverManager;
import java.sql.SQLException;

/* usage

// copy 2000 rows 1 by 1, with auto commit
java --source 11 -cp "./mysql-connector-java-8.0.26.jar" jt.java 2000

// copy 50K rows in 1000 row batches with auto commit (after each batch insertion)
java --source 11 -cp "./mysql-connector-java-8.0.26.jar" jt.java 50000 1000

// copy 50K rows in 1000 row batches, single transaction
java --source 11 -cp "./mysql-connector-java-8.0.26.jar" jt.java 50000 1000 false

*/

class Jt {
    public static void main(String[] args) {

        var rowsToCopy = args.length > 0 ? Integer.parseInt(args[0]) : 1000;
        var batchSize = args.length > 1 ? Integer.parseInt(args[1]) : 1;
        boolean autoCommit = args.length > 2 ? Boolean.parseBoolean(args[2]) : true;

        try {

            long start = System.currentTimeMillis();

            var conn = DriverManager.getConnection(
                    "jdbc:mysql://localhost/test?user=root&password=root&rewriteBatchedStatements=true");

            conn.setAutoCommit(autoCommit);

            var stmt = conn.prepareStatement("insert into test.test values(?,?)");

            if (batchSize == 1) {
                System.out.println("No batching");
                for (int i = 1; i <= rowsToCopy; i++) {
                    stmt.setInt(1, i * 3);
                    stmt.setInt(2, i * 5);
                    stmt.executeUpdate();
                }
            } else {
                System.out.println("batch size: " + batchSize);
                for (int i = 1; i <= rowsToCopy; i++) {
                    stmt.setInt(1, i * 3);
                    stmt.setInt(2, i * 5);
                    stmt.addBatch();
                    if (i % batchSize == 0)
                        stmt.executeBatch();
                }
                stmt.executeBatch();
            }
            if (!autoCommit)
                conn.commit();

            long finish = System.currentTimeMillis();
            long dur = (finish - start);
            System.out.println(String.format("copied %d rows in %,d ms (%d rows per sec). Autocommit: %b", rowsToCopy,
                    dur, rowsToCopy * 1000 / dur, autoCommit));

            var s = conn.createStatement();
            var rs = s.executeQuery("select sum(1) from test.test");
            rs.next();
            System.out.println(String.format("Current row count: %d", rs.getInt(1)));
            conn.close();

        } catch (SQLException ex) {
            // handle any errors
            System.out.println("SQLException: " + ex.getMessage());
            System.out.println("SQLState: " + ex.getSQLState());
            System.out.println("VendorError: " + ex.getErrorCode());
        }

    }
}

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:10

github_iconTop GitHub Comments

1reaction
mikeTWC1984commented, Oct 6, 2021

driver: https://repo1.maven.org/maven2/com/oracle/database/jdbc/ojdbc8/21.3.0.0/ojdbc8-21.3.0.0.jar

docker: docker run -d -p 1521:1521 -e ORACLE_PASSWORD=123456 gvenzl/oracle-xe

jdbc url: jdbc:oracle:thin:@//localhost:1521/XEPDB1 user: SYSTEM pass: 123456

test ddl


CREATE TABLE system.test (a int, b int)
INSERT INTO system.test SELECT 1, 2 FROM dual

CREATE TABLE system."test" (a int, b int)
INSERT INTO system."test" SELECT 3, 4 FROM dual

SELECT * FROM system.test
SELECT * FROM system."test"
0reactions
zhicwucommented, Oct 8, 2021

Please be aware that increasing batch_size comes at a cost. The larger batch_size the higher chance JDBC bridge will run of out memory, because it has to hold the whole batch in memory before sending over to target database. I’d suggest to set a reasonable number by considering row size(column count and size of each column etc.), concurrency, SLA, and JDBC bridge memory configuration etc. together. On a side note, fetch_size has similar issue but it’s just for query.

As to either use validation query like “select 1” or standard JDBC API Connection.isValid(), it has nothing to do with JDBC bridge but HikariCP, the connection pool implementation. However, I do see the headache of tuning configuration for different databases - we should have templates defined in advance so that datasource configuration is just about host, port, database, and credentials. I didn’t mention timeout here but I hope we can have better to configure that as well.

Lastly, to recap issues we discussed in this thread:

  • transaction support is missing - begin -> insert batch by batch -> commit
  • Java 9+ support - I’ll create multi-release jar file in the near future, and then build native image accordingly
  • quotation in oracle - I’ll need to investigate the issue in weekend
Read more comments on GitHub >

github_iconTop Results From Across the Web

How does one investigate the performance of a BULK INSERT ...
You might just have bad disk I/O. If your doing a bulk insert and your disk utilization is not hitting 100%, and is...
Read more >
Bulk Insert to SQL Server gradually slows down - Microsoft Q&A
It is almost certainly due to updating indexes in batches of 10,000. As your table/indexes grow, it takes longer to update. 10,000 is...
Read more >
Hibernate + MySQL simple batch insert extremely slow
Found the answer here. Adding rewriteBatchedStatements=true to my JDBC url fixed it! It now takes ~2.2 seconds to insert all the records.
Read more >
Batch insert mysql is very slow · Issue #1118 - GitHub
I want to use JDBC to insert a lot of data, I have set rewriteBatchedStatements to true and use jdbc batch api to...
Read more >
Extremely Slow Netezza (Database) Batch Inserts using JDBC
Cause. Insert statement batches will not be performed on Netezza the overhead per query will limit the efficiency of these operations.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found