question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error ArrayIndexOutOfBoundsException while inserting array

See original GitHub issue

Environment

  • OS version: Linux CentOS 7
  • JDK version: 8
  • ClickHouse Server version: N/A (latest)
  • ClickHouse Native JDBC version: 2.5.4
  • (Optional) Spark version: 2.4.3 (cluster cdh-6)
  • (Optional) Other components’ version: N/A

Error logs

java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 1048576
	at com.github.housepower.repackaged.net.jpountz.util.SafeUtils.checkRange(SafeUtils.java:24)
	at com.github.housepower.repackaged.net.jpountz.util.SafeUtils.checkRange(SafeUtils.java:32)
	at com.github.housepower.repackaged.net.jpountz.lz4.LZ4JavaSafeCompressor.compress(LZ4JavaSafeCompressor.java:141)
	at com.github.housepower.repackaged.net.jpountz.lz4.LZ4Compressor.compress(LZ4Compressor.java:95)
	at com.github.housepower.jdbc.buffer.CompressedBuffedWriter.flushToTarget(CompressedBuffedWriter.java:75)
	at com.github.housepower.jdbc.serde.BinarySerializer.flushToTarget(BinarySerializer.java:112)
	at com.github.housepower.jdbc.connect.NativeClient.disconnect(NativeClient.java:153)
	at com.github.housepower.jdbc.ClickHouseConnection.close(ClickHouseConnection.java:141)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:710)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:834)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:834)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2121)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2121)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:121)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:407)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1408)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:413)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Steps to reproduce

Insert Array of String into Clickhouse table using Yarn cluster.

Other descriptions

I’ve seen that that error “got” fixed, but as you can see it’s not the case : I’m using shaded versions of plugin too.

Created simple UDF to parse string into Array and then insert it -

  val toArr = udf((value: String) =>
    Option(value) match {
      case None  => Array[String]()
      case Some(value) =>
        if (value.contains(",")) {
          val res = value.split(",", -1).map(_.trim)
          res
        }
        else Array(value)
    })

But i get this error eventually.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
ccelladocommented, Mar 19, 2021

Ok, I’ve worked around it.

The problem was not the data per se, but using arrays in big scoped migrations.

  1. Cloned master with the latest lz4 library changed to Aircompressor @pan3793
  2. Built it with spark-integration for scala 2.11 and spark 2.4.3 (had to change pom files and remove spark-3 version)
  3. Got new problem broken pipe error. Changed batchsize from 500_000 to 10_000 and got it working!

The big question: was it only the change in batch size or does the lz4 library helped too? Maybe i’ll have time to test it later.

Thanks for the cool plugin!

0reactions
ccelladocommented, Mar 19, 2021

Well, in my experience 500000 does not work unfortunately with Arrays.

Forgot to mention - when there were no complex types like Array, 500000 batch went fine.

Read more comments on GitHub >

github_iconTop Results From Across the Web

java.lang.ArrayIndexOutOfBoundsException when adding ...
This question was caused by a typo or a problem that can no longer be reproduced. · Edit the question to include desired...
Read more >
3 Tips to solve and Avoid java.lang ... - Javarevisited
The error ArrayIndexOutOfBoundsException: 1 means index 1 is invalid and it's out of bounding i.e. more than the length of the array. Since...
Read more >
Java ArrayIndexOutOfBoundsException - Baeldung
ArrayIndexOutOfBoundsException occurs when we access an array, or a Collection, that is backed by an array with an invalid index.
Read more >
Array Index Out Of Bounds Exception in Java - GeeksforGeeks
The ArrayIndexOutOfBoundsException is a Runtime Exception thrown only at runtime. The Java Compiler does not check for this error during the ...
Read more >
How do I resolve a java.lang.arrayindexoutofboundsexception ...
Then put a breakpoint on that line, and run your code in the debugger. When it hits the breakpoint, it will stop and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found