question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How correctly create excel file with multiple sheets from multiple DataFrame?

See original GitHub issue

In Scala/Spark application I created two different DataFrame. Spark: 2.3.3 and Scala: 2.11.8. My task is to create one excel file with two sheet for each DataFrame. I decided to use spark-excel library (0.12.0) but I am little bit confused.

Here is my code:

import org.apache.spark.sql.Dataset
import spark.implicits._

val df1 = Seq(
    ("2019-01-01 00:00:00", "7056589658"),
    ("2019-02-02 00:00:00", "7778965896")
).toDF("DATE_TIME", "PHONE_NUMBER")

df1.show()

val df2 = Seq(
    ("2019-01-01 01:00:00", "194.67.45.126"),
    ("2019-02-02 00:00:00", "102.85.62.100"),
    ("2019-03-03 03:00:00", "102.85.62.100")
).toDF("DATE_TIME", "IP")

df2.show()

df1.write
    .format("com.crealytics.spark.excel")
    .option("dataAddress", "'First'!A1:B1000")
    .option("useHeader", "true")
    .mode("append")
    .save("/hdd/home/NNogerbek/data.xlsx")

df2.write
    .format("com.crealytics.spark.excel")
    .option("dataAddress", "'Second'!A1:B1000")
    .option("useHeader", "true")
    .mode("append")
    .save("/hdd/home/NNogerbek/data.xlsx")

In Mesos I notice that my pretty simple code created more than 200 jobs. A new job is created every minute. All of them run the code of DataLocator.scala file which is inside the spark-excel library. In my opinion, this is a very strange behavior.

Снимок

As far as I understand the future excel file is saved in the hdfs file system, right? I need to set the path of the future excel file in .save() method, right? Also I don’t understand what format should be in dataAddress option? Is there any error in my code?

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:16

github_iconTop GitHub Comments

2reactions
nurzhannogerbekcommented, Aug 30, 2019

@nightscape thank you very much for the clarification and for the wonderful library! 😃

1reaction
nurzhannogerbekcommented, Aug 29, 2019

Well, finally I found the correct way to solve my task. Here below is working code:

import spark.implicits._

val df1 = Seq(
  ("2019-01-01 00:00:00", "7056589658"),
  ("2019-02-02 00:00:00", "7778965896")
).toDF("DATE_TIME", "PHONE_NUMBER")

df1.show()

val df2 = Seq(
  ("2019-01-01 01:00:00", "194.67.45.126"),
  ("2019-02-02 00:00:00", "102.85.62.100"),
  ("2019-03-03 03:00:00", "102.85.62.100")
).toDF("DATE_TIME", "IP")

df2.show()

df1.coalesce(1).write
  .format("com.crealytics.spark.excel")
  .option("dataAddress", "'My Sheet1'!A1:Z1000000")
  .option("useHeader", "true")
  .option("dateFormat", "yy-mmm-d")
  .option("timestampFormat", "mm-dd-yyyy hh:mm:ss")
  .mode("append")
  .save("/hdd/home/NNogerbek/data.xlsx")

df2.coalesce(1).write
  .format("com.crealytics.spark.excel")
  .option("dataAddress", "'My Sheet2'!A1:Z1000000")
  .option("useHeader", "true")
  .option("dateFormat", "yy-mmm-d")
  .option("timestampFormat", "mm-dd-yyyy hh:mm:ss")
  .mode("append")
  .save("/hdd/home/NNogerbek/data.xlsx")

QUESTION: Is it necessary to set cells in dataAddress option? Let’s say I want to set only name of the sheet?! Is it possible? I tried next options: .option("dataAddress", "'My Sheet1'") and .option("dataAddress", "My Sheet1") but both of them raise error.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Write Pandas DataFrames to Multiple Excel Sheets?
If we want to write to multiple sheets, we need to create an ExcelWriter object with target filename and also need to specify...
Read more >
Example: Pandas Excel with multiple dataframes - XlsxWriter
An example of writing multiple dataframes to worksheets using Pandas and ... Create a Pandas Excel writer using XlsxWriter as the engine. writer...
Read more >
Save list of DataFrames to multisheet Excel spreadsheet
You should be using pandas own ExcelWriter class: from pandas import ExcelWriter # from pandas.io.parsers import ExcelWriter.
Read more >
Combine Multiple Excel Worksheets Into a Single Pandas ...
Very useful! Now, how do you do the reverse? That is, save several dataframes to multiple sheets inside one excel file? Thanks again....
Read more >
How to save Pandas data into Excel multiple sheets?
Steps · Create a two-dimensional, size-mutable, potentially heterogeneous tabular data, df1. · Print the input DataFrame, df1. · Create another ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found