SHC with Spark Structured Streaming
See original GitHub issueHi,
I have a Spark Structured Streaming application where I’d like to write streaming data to HBase using SHC. It reads data from a location where new csv files continuously are being created. The defined catalog works for writing a DataFrame with identical data into HBase. The key components of my streaming application are a DataStreamReader and a DataStreamWriter.
val inputDataStream = spark
.readStream
.option("sep", ",")
.schema(schema)
.csv("/path/to/data/*.csv")
inputDataStream
.writeStream
.outputMode("append")
.options(
Map(HBaseTableCatalog.tableCatalog -> catalog, HBaseTableCatalog.newTable -> "2"))
.format("org.apache.spark.sql.execution.datasources.hbase")
.start
When running the application I’m getting the following message:
Exception in thread "main" java.lang.UnsupportedOperationException: Data source org.apache.spark.sql.execution.datasources.hbase does not support streamed writing at org.apache.spark.sql.execution.datasources.DataSource.createSink(DataSource.scala:285) at org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:286) at my.package.SHCStreamingApplication$.main(SHCStreamingApplication.scala:153) at my.package.SHCStreamingApplication.main(SHCStreamingApplication.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Does anyone know a solution or way/workaround to still use the SHC for writing structured streaming data to HBase? Thanks in advance!
Issue Analytics
- State:
- Created 6 years ago
- Comments:35

Top Related StackOverflow Question
Excellent, glad to help!!!
You can write your custom sink provider, inherited from StreamSinkProvider, this is my implementation:
This is example, how to use ():