question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Writing a DataFrame to an Excel file doesn't produce Operation in Spark Logical Plan

See original GitHub issue

Hello, I’m trying to add support for spark-excel in our Data Lineage Tracking tool Spline.

We use Spark QueryExecutionListener to capture the logical plan, and then we create lineage from it. Unfortunately there is no operation in the plan for writing in the Excel. On the other hand the Reading from Excel is contained in the plan.

We will be adding support for reading operation, but if you manage to make spark include the write operation in plan we can add support for that in Spline as well.

// This is the code I use when trying to produce the plan:

  val df = spark.read
    .format("com.crealytics.spark.excel")
    .option("useHeader", "true") // Required
    .load("/Users/abac720/test.xlsx")

  val res = df.select($"number" + 1, $"text")

  res.write
    .format("com.crealytics.spark.excel")
    .option("dataAddress", "'My Sheet'!B3:C35")
    .option("useHeader", "true")
    .save("data/output/batchWithDependencies/result.xlsx")

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:5

github_iconTop GitHub Comments

2reactions
cerveadacommented, Mar 2, 2020

At the end I was able to extract lineage data for both Read and Write.

I don’t know why it didn’t work from the start, but Ii does now. So I’m closing this ticket.

1reaction
cerveadacommented, Feb 21, 2020

Ok, I will add support for the reading only now and after the new version is released we can add support for it as well.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Spark SQL, DataFrames and Datasets Guide
With a SparkSession , applications can create DataFrames from an existing RDD , from a Hive table, or from Spark data sources. As...
Read more >
Work with Apache Spark Scala DataFrames - Azure Databricks
Learn how to load and transform data using the Apache Spark Scala DataFrame API in Azure Databricks.
Read more >
Spark's Logical and Physical plans … When, Why ... - Medium
An execution plan is the set of operations executed to translate a query language statement (SQL, Spark SQL, Dataframe operations etc.) ...
Read more >
Spark vs Pandas, part 2 - Towards Data Science
In contrast to Pandas, Spark uses a lazy execution model. This means that when you apply some transformation to a DataFrame, the data...
Read more >
Data Cleaning with Apache Spark - Notes by Louisa - GitBook
Spark will: Automatically create columns in a DataFrame based on sep argument df1 = spark.read.csv('datafile.csv.gz', sep=','). Defaults to using ,. Can still ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found