Writing a DataFrame to an Excel file doesn't produce Operation in Spark Logical Plan
See original GitHub issueHello, I’m trying to add support for spark-excel in our Data Lineage Tracking tool Spline.
We use Spark QueryExecutionListener
to capture the logical plan, and then we create lineage from it. Unfortunately there is no operation in the plan for writing in the Excel. On the other hand the Reading from Excel is contained in the plan.
We will be adding support for reading operation, but if you manage to make spark include the write operation in plan we can add support for that in Spline as well.
// This is the code I use when trying to produce the plan:
val df = spark.read
.format("com.crealytics.spark.excel")
.option("useHeader", "true") // Required
.load("/Users/abac720/test.xlsx")
val res = df.select($"number" + 1, $"text")
res.write
.format("com.crealytics.spark.excel")
.option("dataAddress", "'My Sheet'!B3:C35")
.option("useHeader", "true")
.save("data/output/batchWithDependencies/result.xlsx")
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:5
Top Results From Across the Web
Spark SQL, DataFrames and Datasets Guide
With a SparkSession , applications can create DataFrames from an existing RDD , from a Hive table, or from Spark data sources. As...
Read more >Work with Apache Spark Scala DataFrames - Azure Databricks
Learn how to load and transform data using the Apache Spark Scala DataFrame API in Azure Databricks.
Read more >Spark's Logical and Physical plans … When, Why ... - Medium
An execution plan is the set of operations executed to translate a query language statement (SQL, Spark SQL, Dataframe operations etc.) ...
Read more >Spark vs Pandas, part 2 - Towards Data Science
In contrast to Pandas, Spark uses a lazy execution model. This means that when you apply some transformation to a DataFrame, the data...
Read more >Data Cleaning with Apache Spark - Notes by Louisa - GitBook
Spark will: Automatically create columns in a DataFrame based on sep argument df1 = spark.read.csv('datafile.csv.gz', sep=','). Defaults to using ,. Can still ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
At the end I was able to extract lineage data for both Read and Write.
I don’t know why it didn’t work from the start, but Ii does now. So I’m closing this ticket.
Ok, I will add support for the reading only now and after the new version is released we can add support for it as well.