question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Importing 150M+ rows from the MySQL database

See original GitHub issue

Hello, I have an issue with importing a truly large table into ES using this tool.

I have a large 154M record table that I need to import into ES and it worked great up until 30M+. I suspect the problem is with selecting the rows from the DB which has to count trough all of the rows before returning the relevant ones in InnoDB. I suspected this because the script seems to work fine for a 100k entries when I add an ‘id > 3XXXXXXX’ where statement and then dies again.

This is the log for the script:

Wed Aug 10 16:28:48 PDT 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
[17:41:45,462][ERROR][importer.jdbc.context.standard][pool-3-thread-1] at fetch: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet successfully received from the server was 337 milliseconds ago.  The last packet sent successfully to the server was 4,375,856 milliseconds ago.
java.io.IOException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet successfully received from the server was 337 milliseconds ago.  The last packet sent successfully to the server was 4,375,856 milliseconds ago.
    at org.xbib.elasticsearch.jdbc.strategy.standard.StandardSource.fetch(StandardSource.java:631) ~[elasticsearch-jdbc-2.3.4.0.jar:?]
    at org.xbib.elasticsearch.jdbc.strategy.standard.StandardContext.fetch(StandardContext.java:191) [elasticsearch-jdbc-2.3.4.0.jar:?]
    at org.xbib.elasticsearch.jdbc.strategy.standard.StandardContext.execute(StandardContext.java:166) [elasticsearch-jdbc-2.3.4.0.jar:?]
    at org.xbib.tools.JDBCImporter.process(JDBCImporter.java:199) [elasticsearch-jdbc-2.3.4.0.jar:?]
    at org.xbib.tools.JDBCImporter.newRequest(JDBCImporter.java:185) [elasticsearch-jdbc-2.3.4.0.jar:?]
    at org.xbib.tools.JDBCImporter.newRequest(JDBCImporter.java:51) [elasticsearch-jdbc-2.3.4.0.jar:?]
    at org.xbib.pipeline.AbstractPipeline.call(AbstractPipeline.java:50) [elasticsearch-jdbc-2.3.4.0.jar:?]
    at org.xbib.pipeline.AbstractPipeline.call(AbstractPipeline.java:16) [elasticsearch-jdbc-2.3.4.0.jar:?]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_101]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_101]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_101]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet successfully received from the server was 337 milliseconds ago.  The last packet sent successfully to the server was 4,375,856 milliseconds ago.
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_101]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_101]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_101]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_101]
    at com.mysql.jdbc.Util.handleNewInstance(Util.java:404) ~[mysql-connector-java-5.1.38.jar:5.1.38]
    at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:981) ~[mysql-connector-java-5.1.38.jar:5.1.38]
    at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3652) ~[mysql-connector-java-5.1.38.jar:5.1.38]
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2460) ~[mysql-connector-java-5.1.38.jar:5.1.38]
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2625) ~[mysql-connector-java-5.1.38.jar:5.1.38]
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2547) ~[mysql-connector-java-5.1.38.jar:5.1.38]
    at com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1454) ~[mysql-connector-java-5.1.38.jar:5.1.38]
    at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:178) ~[mysql-connector-java-5.1.38.jar:5.1.38]
    at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:6709) ~[mysql-connector-java-5.1.38.jar:5.1.38]
    at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:851) ~[mysql-connector-java-5.1.38.jar:5.1.38]
    at org.xbib.elasticsearch.jdbc.strategy.standard.StandardSource.close(StandardSource.java:1130) ~[elasticsearch-jdbc-2.3.4.0.jar:?]
    at org.xbib.elasticsearch.jdbc.strategy.standard.StandardSource.execute(StandardSource.java:701) ~[elasticsearch-jdbc-2.3.4.0.jar:?]
    at org.xbib.elasticsearch.jdbc.strategy.standard.StandardSource.fetch(StandardSource.java:616) ~[elasticsearch-jdbc-2.3.4.0.jar:?]
    ... 11 more
Caused by: java.net.SocketException: Broken pipe
    at java.net.SocketOutputStream.socketWrite0(Native Method) ~[?:1.8.0_101]
    at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) ~[?:1.8.0_101]
    at java.net.SocketOutputStream.write(SocketOutputStream.java:153) ~[?:1.8.0_101]
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[?:1.8.0_101]
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[?:1.8.0_101]
    at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3634) ~[mysql-connector-java-5.1.38.jar:5.1.38]
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2460) ~[mysql-connector-java-5.1.38.jar:5.1.38]
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2625) ~[mysql-connector-java-5.1.38.jar:5.1.38]
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2547) ~[mysql-connector-java-5.1.38.jar:5.1.38]
    at com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1454) ~[mysql-connector-java-5.1.38.jar:5.1.38]
    at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:178) ~[mysql-connector-java-5.1.38.jar:5.1.38]
    at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:6709) ~[mysql-connector-java-5.1.38.jar:5.1.38]
    at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:851) ~[mysql-connector-java-5.1.38.jar:5.1.38]
    at org.xbib.elasticsearch.jdbc.strategy.standard.StandardSource.close(StandardSource.java:1130) ~[elasticsearch-jdbc-2.3.4.0.jar:?]
    at org.xbib.elasticsearch.jdbc.strategy.standard.StandardSource.execute(StandardSource.java:701) ~[elasticsearch-jdbc-2.3.4.0.jar:?]
    at org.xbib.elasticsearch.jdbc.strategy.standard.StandardSource.fetch(StandardSource.java:616) ~[elasticsearch-jdbc-2.3.4.0.jar:?]
    ... 11 more

And this is the script I use to run the import (the ID part has been recently added, it first ran without it):

#!/bin/sh

#DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
DIR="/home/ubuntu/elasticsearch-jdbc-2.3.4.0"
bin=${DIR}/../bin
lib=${DIR}/../lib

echo '
{
    "type" : "jdbc",
    "jdbc" : {
        "url" : "jdbc:mysql://localhost:3306/entries",
        "user" : "root",
        "password" : "secret",
        "sql" :  "SELECT id as _id, email, text, username, CONCAT_WS(\" \", first_name, last_name) as name FROM entries WHERE id > 32281918",
        "treat_binary_as_string" : true,
        "elasticsearch" : {
            "cluster" : "searcher",
            "host" : "localhost",
            "port" : 9300
        },
        "max_bulk_actions" : 20000,
        "max_concurrent_bulk_requests" : 10,
        "index" : "entries",
    "threadpoolsize": 1
      }
}
' | java \
    -cp "/home/ubuntu/elasticsearch-jdbc-2.3.4.0/lib/*" \
    -Dlog4j.configurationFile=${bin}/log4j2.xml \
    org.xbib.tools.Runner \
    org.xbib.tools.JDBCImporter

Is my assumption above correct, and if it is, can I rewrite the query somehow to iterate using ‘WHERE id > N’ instead of OFFSET and LIMIT.

If I am horribly wrong with my assumption, how can I make this import work to import all 150M records.

Thanks for your effort @jprante !

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
jprantecommented, Aug 12, 2016

Also, you should watch out if your network (firewall, virus checker, whatever) forces connection aborts after being idle for 1 hour.

0reactions
TheWildHorsecommented, Aug 16, 2016

@jprante This worked great, thank you so much! 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Generating a massive 150M-row MySQL table - Stack Overflow
I have a C program that mines a huge data source (20GB of raw text) and generates loads of INSERTs to execute on...
Read more >
MySQL 8.0 Reference Manual :: 13.2.9 LOAD DATA Statement
The LOAD DATA statement reads rows from a text file into a table at a very high speed. ... See Section 4.5.5, “mysqlimport...
Read more >
13.2.6 IMPORT TABLE Statement - MySQL :: Developer Zone
Tables can be exported from one server using mysqldump to write a file of SQL statements and imported into another server using mysql...
Read more >
MySQL Shell 8.0 :: 11.4 Parallel Table Import Utility
MySQL Shell's parallel table import utility util.importTable() provides rapid data import to a MySQL relational table for large data files.
Read more >
6.5.1 Table Data Export and Import Wizard
This wizard only exports/imports tables using the JSON or CSV format. For an overview of the data export and import options in MySQL...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found