How correctly create excel file with multiple sheets from multiple DataFrame?
See original GitHub issueIn Scala/Spark
application I created two different DataFrame. Spark: 2.3.3
and Scala: 2.11.8.
My task is to create one excel file with two sheet for each DataFrame.
I decided to use spark-excel
library (0.12.0
) but I am little bit confused.
Here is my code:
import org.apache.spark.sql.Dataset
import spark.implicits._
val df1 = Seq(
("2019-01-01 00:00:00", "7056589658"),
("2019-02-02 00:00:00", "7778965896")
).toDF("DATE_TIME", "PHONE_NUMBER")
df1.show()
val df2 = Seq(
("2019-01-01 01:00:00", "194.67.45.126"),
("2019-02-02 00:00:00", "102.85.62.100"),
("2019-03-03 03:00:00", "102.85.62.100")
).toDF("DATE_TIME", "IP")
df2.show()
df1.write
.format("com.crealytics.spark.excel")
.option("dataAddress", "'First'!A1:B1000")
.option("useHeader", "true")
.mode("append")
.save("/hdd/home/NNogerbek/data.xlsx")
df2.write
.format("com.crealytics.spark.excel")
.option("dataAddress", "'Second'!A1:B1000")
.option("useHeader", "true")
.mode("append")
.save("/hdd/home/NNogerbek/data.xlsx")
In Mesos
I notice that my pretty simple code created more than 200 jobs. A new job is created every minute. All of them run the code of DataLocator.scala file which is inside the spark-excel
library. In my opinion, this is a very strange behavior.
As far as I understand the future excel file is saved in the hdfs file system, right? I need to set the path of the future excel file in .save()
method, right? Also I don’t understand what format should be in dataAddress
option? Is there any error in my code?
Issue Analytics
- State:
- Created 4 years ago
- Comments:16
Top GitHub Comments
@nightscape thank you very much for the clarification and for the wonderful library! 😃
Well, finally I found the correct way to solve my task. Here below is working code:
QUESTION: Is it necessary to set cells in
dataAddress
option? Let’s say I want to set only name of the sheet?! Is it possible? I tried next options:.option("dataAddress", "'My Sheet1'")
and.option("dataAddress", "My Sheet1")
but both of them raise error.