Error ArrayIndexOutOfBoundsException while inserting array
See original GitHub issueEnvironment
- OS version: Linux CentOS 7
- JDK version: 8
- ClickHouse Server version: N/A (latest)
- ClickHouse Native JDBC version: 2.5.4
- (Optional) Spark version: 2.4.3 (cluster cdh-6)
- (Optional) Other components’ version: N/A
Error logs
java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 1048576
at com.github.housepower.repackaged.net.jpountz.util.SafeUtils.checkRange(SafeUtils.java:24)
at com.github.housepower.repackaged.net.jpountz.util.SafeUtils.checkRange(SafeUtils.java:32)
at com.github.housepower.repackaged.net.jpountz.lz4.LZ4JavaSafeCompressor.compress(LZ4JavaSafeCompressor.java:141)
at com.github.housepower.repackaged.net.jpountz.lz4.LZ4Compressor.compress(LZ4Compressor.java:95)
at com.github.housepower.jdbc.buffer.CompressedBuffedWriter.flushToTarget(CompressedBuffedWriter.java:75)
at com.github.housepower.jdbc.serde.BinarySerializer.flushToTarget(BinarySerializer.java:112)
at com.github.housepower.jdbc.connect.NativeClient.disconnect(NativeClient.java:153)
at com.github.housepower.jdbc.ClickHouseConnection.close(ClickHouseConnection.java:141)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:710)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:834)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:834)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2121)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2121)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:407)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1408)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:413)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Steps to reproduce
Insert Array of String into Clickhouse table using Yarn cluster.
Other descriptions
I’ve seen that that error “got” fixed, but as you can see it’s not the case : I’m using shaded versions of plugin too.
Created simple UDF to parse string into Array and then insert it -
val toArr = udf((value: String) =>
Option(value) match {
case None => Array[String]()
case Some(value) =>
if (value.contains(",")) {
val res = value.split(",", -1).map(_.trim)
res
}
else Array(value)
})
But i get this error eventually.
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
java.lang.ArrayIndexOutOfBoundsException when adding ...
This question was caused by a typo or a problem that can no longer be reproduced. · Edit the question to include desired...
Read more >3 Tips to solve and Avoid java.lang ... - Javarevisited
The error ArrayIndexOutOfBoundsException: 1 means index 1 is invalid and it's out of bounding i.e. more than the length of the array. Since...
Read more >Java ArrayIndexOutOfBoundsException - Baeldung
ArrayIndexOutOfBoundsException occurs when we access an array, or a Collection, that is backed by an array with an invalid index.
Read more >Array Index Out Of Bounds Exception in Java - GeeksforGeeks
The ArrayIndexOutOfBoundsException is a Runtime Exception thrown only at runtime. The Java Compiler does not check for this error during the ...
Read more >How do I resolve a java.lang.arrayindexoutofboundsexception ...
Then put a breakpoint on that line, and run your code in the debugger. When it hits the breakpoint, it will stop and...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Ok, I’ve worked around it.
The problem was not the data per se, but using arrays in big scoped migrations.
broken pipe
error. Changedbatchsize
from 500_000 to 10_000 and got it working!The big question: was it only the change in batch size or does the lz4 library helped too? Maybe i’ll have time to test it later.
Thanks for the cool plugin!
Well, in my experience 500000 does not work unfortunately with Arrays.
Forgot to mention - when there were no complex types like Array, 500000 batch went fine.