Cannot perform join between points and polygon using Scala 2.11 and Spark 2.3.1
See original GitHub issueCurrently trying to join to dataframes with the following command:
val df_green_pickup = green_data.join(neighborhoods).where($"pickup_point" within $"polygon") display(df_green_pickup)
Having the following exception:
SparkException: Job aborted due to stage failure: Task 0 in stage 44.0 failed 4 times, most recent failure: Lost task 0.3 in stage 44.0 (TID 875, 10.139.64.11, executor 10): java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.codegen.ExprCode.value()Ljava/lang/String; at org.apache.spark.sql.catalyst.expressions.Within$$anonfun$doGenCode$2.apply(predicates.scala:202) at org.apache.spark.sql.catalyst.expressions.Within$$anonfun$doGenCode$2.apply(predicates.scala:180) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.nullSafeCodeGen(Expression.scala:553) at org.apache.spark.sql.catalyst.expressions.Within.doGenCode(predicates.scala:180) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:111) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:108) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:108) at org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.create(GeneratePredicate.scala:60) at org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.generate(GeneratePredicate.scala:46) at org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:382) at org.apache.spark.sql.execution.joins.CartesianProductExec$$anonfun$doExecute$1.apply(CartesianProductExec.scala:84) at org.apache.spark.sql.execution.joins.CartesianProductExec$$anonfun$doExecute$1.apply(CartesianProductExec.scala:81) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$apply$24.apply(RDD.scala:830) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$apply$24.apply(RDD.scala:830) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:42) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:336) at org.apache.spark.rdd.RDD.iterator(RDD.scala:300) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:112) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:384) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Did anyone tried this on the same versions?
Thank you
Issue Analytics
- State:
- Created 5 years ago
- Comments:9 (2 by maintainers)
Top GitHub Comments
@djpirra and @lmerchante
Magellan 1.0.5
is not compatible withSpark 2.3.1
. You have to wait until the next release or compile from source, since the master branch is already compatible with Spark 2.3.1. I tested it last week and it works just fine.It does not work with full compatibility. I have compiled from master and was able to read the points and polygons, but it didn’t worked when I try to intersect them… Something is still not working right.
Em qua, 19 de set de 2018 às 17:29, Diego Mora Cespedes < notifications@github.com> escreveu:
– Melhores Cumprimentos, Luis Simões