Importing 150M+ rows from the MySQL database
See original GitHub issueHello, I have an issue with importing a truly large table into ES using this tool.
I have a large 154M record table that I need to import into ES and it worked great up until 30M+. I suspect the problem is with selecting the rows from the DB which has to count trough all of the rows before returning the relevant ones in InnoDB. I suspected this because the script seems to work fine for a 100k entries when I add an ‘id > 3XXXXXXX’ where statement and then dies again.
This is the log for the script:
Wed Aug 10 16:28:48 PDT 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
[17:41:45,462][ERROR][importer.jdbc.context.standard][pool-3-thread-1] at fetch: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 337 milliseconds ago. The last packet sent successfully to the server was 4,375,856 milliseconds ago.
java.io.IOException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 337 milliseconds ago. The last packet sent successfully to the server was 4,375,856 milliseconds ago.
at org.xbib.elasticsearch.jdbc.strategy.standard.StandardSource.fetch(StandardSource.java:631) ~[elasticsearch-jdbc-2.3.4.0.jar:?]
at org.xbib.elasticsearch.jdbc.strategy.standard.StandardContext.fetch(StandardContext.java:191) [elasticsearch-jdbc-2.3.4.0.jar:?]
at org.xbib.elasticsearch.jdbc.strategy.standard.StandardContext.execute(StandardContext.java:166) [elasticsearch-jdbc-2.3.4.0.jar:?]
at org.xbib.tools.JDBCImporter.process(JDBCImporter.java:199) [elasticsearch-jdbc-2.3.4.0.jar:?]
at org.xbib.tools.JDBCImporter.newRequest(JDBCImporter.java:185) [elasticsearch-jdbc-2.3.4.0.jar:?]
at org.xbib.tools.JDBCImporter.newRequest(JDBCImporter.java:51) [elasticsearch-jdbc-2.3.4.0.jar:?]
at org.xbib.pipeline.AbstractPipeline.call(AbstractPipeline.java:50) [elasticsearch-jdbc-2.3.4.0.jar:?]
at org.xbib.pipeline.AbstractPipeline.call(AbstractPipeline.java:16) [elasticsearch-jdbc-2.3.4.0.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_101]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_101]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_101]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 337 milliseconds ago. The last packet sent successfully to the server was 4,375,856 milliseconds ago.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_101]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_101]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_101]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_101]
at com.mysql.jdbc.Util.handleNewInstance(Util.java:404) ~[mysql-connector-java-5.1.38.jar:5.1.38]
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:981) ~[mysql-connector-java-5.1.38.jar:5.1.38]
at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3652) ~[mysql-connector-java-5.1.38.jar:5.1.38]
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2460) ~[mysql-connector-java-5.1.38.jar:5.1.38]
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2625) ~[mysql-connector-java-5.1.38.jar:5.1.38]
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2547) ~[mysql-connector-java-5.1.38.jar:5.1.38]
at com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1454) ~[mysql-connector-java-5.1.38.jar:5.1.38]
at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:178) ~[mysql-connector-java-5.1.38.jar:5.1.38]
at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:6709) ~[mysql-connector-java-5.1.38.jar:5.1.38]
at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:851) ~[mysql-connector-java-5.1.38.jar:5.1.38]
at org.xbib.elasticsearch.jdbc.strategy.standard.StandardSource.close(StandardSource.java:1130) ~[elasticsearch-jdbc-2.3.4.0.jar:?]
at org.xbib.elasticsearch.jdbc.strategy.standard.StandardSource.execute(StandardSource.java:701) ~[elasticsearch-jdbc-2.3.4.0.jar:?]
at org.xbib.elasticsearch.jdbc.strategy.standard.StandardSource.fetch(StandardSource.java:616) ~[elasticsearch-jdbc-2.3.4.0.jar:?]
... 11 more
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method) ~[?:1.8.0_101]
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) ~[?:1.8.0_101]
at java.net.SocketOutputStream.write(SocketOutputStream.java:153) ~[?:1.8.0_101]
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[?:1.8.0_101]
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[?:1.8.0_101]
at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3634) ~[mysql-connector-java-5.1.38.jar:5.1.38]
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2460) ~[mysql-connector-java-5.1.38.jar:5.1.38]
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2625) ~[mysql-connector-java-5.1.38.jar:5.1.38]
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2547) ~[mysql-connector-java-5.1.38.jar:5.1.38]
at com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1454) ~[mysql-connector-java-5.1.38.jar:5.1.38]
at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:178) ~[mysql-connector-java-5.1.38.jar:5.1.38]
at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:6709) ~[mysql-connector-java-5.1.38.jar:5.1.38]
at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:851) ~[mysql-connector-java-5.1.38.jar:5.1.38]
at org.xbib.elasticsearch.jdbc.strategy.standard.StandardSource.close(StandardSource.java:1130) ~[elasticsearch-jdbc-2.3.4.0.jar:?]
at org.xbib.elasticsearch.jdbc.strategy.standard.StandardSource.execute(StandardSource.java:701) ~[elasticsearch-jdbc-2.3.4.0.jar:?]
at org.xbib.elasticsearch.jdbc.strategy.standard.StandardSource.fetch(StandardSource.java:616) ~[elasticsearch-jdbc-2.3.4.0.jar:?]
... 11 more
And this is the script I use to run the import (the ID part has been recently added, it first ran without it):
#!/bin/sh
#DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
DIR="/home/ubuntu/elasticsearch-jdbc-2.3.4.0"
bin=${DIR}/../bin
lib=${DIR}/../lib
echo '
{
"type" : "jdbc",
"jdbc" : {
"url" : "jdbc:mysql://localhost:3306/entries",
"user" : "root",
"password" : "secret",
"sql" : "SELECT id as _id, email, text, username, CONCAT_WS(\" \", first_name, last_name) as name FROM entries WHERE id > 32281918",
"treat_binary_as_string" : true,
"elasticsearch" : {
"cluster" : "searcher",
"host" : "localhost",
"port" : 9300
},
"max_bulk_actions" : 20000,
"max_concurrent_bulk_requests" : 10,
"index" : "entries",
"threadpoolsize": 1
}
}
' | java \
-cp "/home/ubuntu/elasticsearch-jdbc-2.3.4.0/lib/*" \
-Dlog4j.configurationFile=${bin}/log4j2.xml \
org.xbib.tools.Runner \
org.xbib.tools.JDBCImporter
Is my assumption above correct, and if it is, can I rewrite the query somehow to iterate using ‘WHERE id > N’ instead of OFFSET and LIMIT.
If I am horribly wrong with my assumption, how can I make this import work to import all 150M records.
Thanks for your effort @jprante !
Issue Analytics
- State:
- Created 7 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
Also, you should watch out if your network (firewall, virus checker, whatever) forces connection aborts after being idle for 1 hour.
@jprante This worked great, thank you so much! 😃