com.databricks.spark.redshift.RedshiftFileFormat.supportDataType(Lorg/apache/spark/sql/types/DataType;Z)Z
See original GitHub issueI’m attempting to do a simple query from a redshift table and receiving the following error.
Exception in thread "main" java.lang.AbstractMethodError: com.databricks.spark.redshift.RedshiftFileFormat.supportDataType(Lorg/apache/spark/sql/types/DataType;Z)Z
at org.apache.spark.sql.execution.datasources.DataSourceUtils$$anonfun$verifySchema$1.apply(DataSourceUtils.scala:48)
at org.apache.spark.sql.execution.datasources.DataSourceUtils$$anonfun$verifySchema$1.apply(DataSourceUtils.scala:47)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at org.apache.spark.sql.types.StructType.foreach(StructType.scala:99)
at org.apache.spark.sql.execution.datasources.DataSourceUtils$.verifySchema(DataSourceUtils.scala:47)
at org.apache.spark.sql.execution.datasources.DataSourceUtils$.verifyReadSchema(DataSourceUtils.scala:39)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:400)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at com.databricks.spark.redshift.RedshiftRelation.buildScan(RedshiftRelation.scala:168)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$10.apply(DataSourceStrategy.scala:293)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$10.apply(DataSourceStrategy.scala:293)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:326)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:325)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProjectRaw(DataSourceStrategy.scala:403)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProject(DataSourceStrategy.scala:321)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy.apply(DataSourceStrategy.scala:289)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3360)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2545)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2759)
at org.apache.spark.sql.Dataset.getRows(Dataset.scala:255)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:292)
at org.apache.spark.sql.Dataset.show(Dataset.scala:746)
at org.apache.spark.sql.Dataset.show(Dataset.scala:705)
at org.apache.spark.sql.Dataset.show(Dataset.scala:714)
at LogChecks$.main(LogChecks.scala:90)
at LogChecks.main(LogChecks.scala)
18/12/12 10:14:24 INFO SparkContext: Invoking stop() from shutdown hook
18/12/12 10:14:24 INFO SparkUI: Stopped Spark web UI at http://10.75.8.183:4040
18/12/12 10:14:24 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/12/12 10:14:24 INFO MemoryStore: MemoryStore cleared
18/12/12 10:14:24 INFO BlockManager: BlockManager stopped
18/12/12 10:14:24 INFO BlockManagerMaster: BlockManagerMaster stopped
18/12/12 10:14:24 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/12/12 10:14:24 INFO SparkContext: Successfully stopped SparkContext
18/12/12 10:14:24 INFO ShutdownHookManager: Shutdown hook called
18/12/12 10:14:24 INFO ShutdownHookManager: Deleting directory /private/var/folders/01/333yn1tn7rb93sspgwp_z_p804ccgt/T/spark-7c91c661-11e5-4b22-8d9f-2afe851dcdd6
CheckRedshift.scala
import org.apache.spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext}
object CheckRedshift {
def main(args: Array[String]): Unit = {
val awsAccessKeyId="asdfasdf"
val awsSecretAccessKey="asdlfjakfalksdf"
val redshiftDBName = "dbName"
val redshiftUserId = "*****"
val redshiftPassword = "*****"
val redshifturl = "blah-blah.us-west-1.redshift.amazonaws.com:5555"
val jdbcURL = s"jdbc:redshift://$redshifturl/$redshiftDBName?user=$redshiftUserId&password=$redshiftPassword"
println(jdbcURL)
val sc = new SparkContext(new SparkConf().setAppName("SparkSQL").setMaster("local"))
// Configure SparkContext to communicate with AWS
val tempS3Dir = "s3a://mys3bucket/temp"
sc.hadoopConfiguration.set("fs.s3a.access.key", awsAccessKeyId)
sc.hadoopConfiguration.set("fs.s3a.secret.key", awsSecretAccessKey)
// Create the SQL Context
val sqlContext = SparkSession.builder.config(sc.getConf).getOrCreate()
import sqlContext.implicits._
val myQuery =
"""SELECT * FROM public.redshift_table
WHERE date_time >= '2018-01-01'
AND date_time < '2018-01-02'"""
val df = sqlContext.read
.format("com.databricks.spark.redshift")
.option("url", jdbcURL)
.option("tempdir", tempS3Dir)
.option("forward_spark_s3_credentials", "true")
.option("query", myQuery)
.load()
df.show()
}
}
build.sbt
name := "log_check"
version := "0.1"
scalaVersion := "2.11.8"
// https://mvnrepository.com/artifact/org.apache.spark/spark-core
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.0"
// https://mvnrepository.com/artifact/org.apache.spark/spark-sql
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.0"
libraryDependencies += "org.json4s" %% "json4s-native" % "3.2.9"
//https://github.com/databricks/spark-redshift
resolvers += "jitpack" at "https://jitpack.io"
libraryDependencies += "com.github.databricks" %% "spark-redshift" % "master-SNAPSHOT"
libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "2.6.0"
libraryDependencies += "com.amazonaws" % "aws-java-sdk" % "1.11.407"
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.6.0"
libraryDependencies += "net.java.dev.jets3t" % "jets3t" % "0.9.4"
// https://mvnrepository.com/artifact/com.amazon/redshift-jdbc41
resolvers += "redshift" at "http://redshift-maven-repository.s3-website-us-east-1.amazonaws.com/release"
libraryDependencies += "com.amazon.redshift" % "redshift-jdbc4" % "1.2.10.1009"
// Temporary fix for: https://github.com/databricks/spark-redshift/issues/315
dependencyOverrides += "com.databricks" % "spark-avro_2.11" % "3.2.0"
Issue Analytics
- State:
- Created 5 years ago
- Reactions:12
- Comments:9
Top Results From Across the Web
Reading data from Amazon redshift in Spark 2.4
I checked online forums and got to know Spark 2.4 has added in built Avro source and this is the reason using databricks,...
Read more >Query Amazon Redshift with Databricks
The Databricks Redshift data source uses Amazon S3 to efficiently transfer ... Read data from a table df = (spark.read .format("redshift") ...
Read more >Redshift Data Source for Apache Spark
A library to load data into Spark SQL DataFrames from Amazon Redshift, and write them back to Redshift tables. Amazon S3 is used...
Read more >spark-redshift
Redshift Data Source for Apache Spark ... A library to load data into Spark SQL DataFrames from Amazon Redshift, and write them back...
Read more >New – Amazon Redshift Integration with Apache Spark
To get started, you can go to AWS analytics and ML services, use data frame or Spark SQL code in a Spark job...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
So, we’ve started a community edition of spark-redshift which works with spark2.4. Feel free to try it out! If you do, it’d be very helpful to receive your feedback. Any pull requests are very very welcome too.
https://github.com/spark-redshift-community/spark-redshift
pinging @andybrnr, @Bennyelg, @julienbachmann who were having issues with 2.4
This issue is related to Spark 2.4. A workaround is to use Spark 2.3.1. Unfortunately it seems nobody is working to add support for 2.4 see #418
It looks like this fork https://github.com/Yelp/spark-redshift added support to spark 2.4 but I didn’t test it. To be checked