Error while using spark-redshift jar
See original GitHub issueHi,
Getting the below error while using the jar to integrate redshift with spark locally.
Exception in thread "main" java.lang.AbstractMethodError: com.databricks.spark.redshift.RedshiftFileFormat.prepareRead(Lorg/apache/spark/sql/SparkSession;Lscala/collection/immutable/Map;Lscala/collection/Seq;)Lscala/collection/immutable/Map;
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:160)
at com.databricks.spark.redshift.RedshiftRelation.buildScan(RedshiftRelation.scala:168)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$3.apply(DataSourceStrategy.scala:141)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$3.apply(DataSourceStrategy.scala:141)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:184)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:183)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:257)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:179)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:137)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:59)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:59)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:60)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:55)
at org.apache.spark.sql.execution.SparkStrategies$SpecialLimits$.apply(SparkStrategies.scala:54)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:59)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:59)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:60)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:77)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:75)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:82)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:82)
at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2462)
at org.apache.spark.sql.Dataset.head(Dataset.scala:1861)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2078)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:240)
at org.apache.spark.sql.Dataset.show(Dataset.scala:533)
at org.apache.spark.sql.Dataset.show(Dataset.scala:493)
at org.apache.spark.sql.Dataset.show(Dataset.scala:502)
at simpleSample.RedshiftToSpark$.main(RedshiftToSpark.scala:53)
at simpleSample.RedshiftToSpark.main(RedshiftToSpark.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
I find that prepareRead method is not in the RedshiftFileFormat.
Thanks & Regards, Ravi
Issue Analytics
- State:
- Created 7 years ago
- Comments:37 (2 by maintainers)
Top Results From Across the Web
Error while Connecting PySpark to AWS Redshift
I had to include 4 jar files in the EMR spark-submit options to get this working. List of jar files: 1.RedshiftJDBC41-1.2.12.1017.jar.
Read more >Redshift JDBC driver conflict issue - Databricks
Cause. Databricks Runtime does not include a Redshift JDBC driver. If you are using Redshift, you must attach the correct driver to your...
Read more >Launching a Spark application with the Amazon Redshift ...
To use the integration, you must pass the required Spark Redshift dependencies with your Spark job. You must use --jars to include Redshift...
Read more >Connecting to Redshift Data Source from Spark
The Spark Redshift connector is supported on Spark 2.4 and later versions, and the supported AWS Redshift JDBC jar version is ...
Read more >databricks / spark-redshift Download - JitPack
You will also need to provide a JDBC driver that is compatible with Redshift. Amazon recommend that you use their driver, which is...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
found the root cause, spark 2.1 added new method to the interface:
org.apache.spark.sql.execution.datasources.OutputWriterFactory#def getFileExtension(context: TaskAttemptContext): String
which is not implemented in spark-avro, hence AbstractMethodError
looks like spark-avro was fixed. any updates here?