question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unexpected formula parsing on string columns

See original GitHub issue

Hi Team, We are facing an issue while writing pyspark df to xslx. One of our column (string type) has some string like this: = >85. When converting the df to xlsx, it is breaking because it tries to parse this formula resulting in error.

 shadeio.poi.ss.formula.FormulaParseException: Parse error near char 1 '>' in specified formula ' >85'. Expected cell ref or constant literal
	at shadeio.poi.ss.formula.FormulaParser.expected(FormulaParser.java:269)
	at shadeio.poi.ss.formula.FormulaParser.parseSimpleFactor(FormulaParser.java:1553)
	at shadeio.poi.ss.formula.FormulaParser.percentFactor(FormulaParser.java:1506)
	at shadeio.poi.ss.formula.FormulaParser.powerFactor(FormulaParser.java:1493)
	at shadeio.poi.ss.formula.FormulaParser.Term(FormulaParser.java:1867)
	at shadeio.poi.ss.formula.FormulaParser.additiveExpression(FormulaParser.java:1994)
	at shadeio.poi.ss.formula.FormulaParser.concatExpression(FormulaParser.java:1978)
	at shadeio.poi.ss.formula.FormulaParser.comparisonExpression(FormulaParser.java:1935)
	at shadeio.poi.ss.formula.FormulaParser.intersectionExpression(FormulaParser.java:1908)
	at shadeio.poi.ss.formula.FormulaParser.unionExpression(FormulaParser.java:1889)
	at shadeio.poi.ss.formula.FormulaParser.parse(FormulaParser.java:2036)
	at shadeio.poi.ss.formula.FormulaParser.parse(FormulaParser.java:170)
	at shadeio.poi.xssf.usermodel.XSSFCell.setFormula(XSSFCell.java:550)
	at shadeio.poi.xssf.usermodel.XSSFCell.setCellFormulaImpl(XSSFCell.java:526)
	at shadeio.poi.ss.usermodel.CellBase.setCellFormula(CellBase.java:132)
	at shadeio.spoiwo.natures.xlsx.Model2XlsxConversions$.convertCell(Model2XlsxConversions.scala:49)
	at shadeio.spoiwo.natures.xlsx.Model2XlsxConversions$$anonfun$convertRow$4.apply(Model2XlsxConversions.scala:143)
	at shadeio.spoiwo.natures.xlsx.Model2XlsxConversions$$anonfun$convertRow$4.apply(Model2XlsxConversions.scala:143)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at shadeio.spoiwo.natures.xlsx.Model2XlsxConversions$.convertRow(Model2XlsxConversions.scala:143)
	at shadeio.spoiwo.natures.xlsx.Model2XlsxConversions$$anonfun$writeToExistingSheet$1.apply(Model2XlsxConversions.scala:156)
	at shadeio.spoiwo.natures.xlsx.Model2XlsxConversions$$anonfun$writeToExistingSheet$1.apply(Model2XlsxConversions.scala:156)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at shadeio.spoiwo.natures.xlsx.Model2XlsxConversions$.writeToExistingSheet(Model2XlsxConversions.scala:156)
	at shadeio.spoiwo.natures.xlsx.Model2XlsxConversions$$anonfun$writeToExistingWorkbook$4.apply(Model2XlsxConversions.scala:324)
	at shadeio.spoiwo.natures.xlsx.Model2XlsxConversions$$anonfun$writeToExistingWorkbook$4.apply(Model2XlsxConversions.scala:321)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
	at shadeio.spoiwo.natures.xlsx.Model2XlsxConversions$.writeToExistingWorkbook(Model2XlsxConversions.scala:321)
	at shadeio.spoiwo.natures.xlsx.Model2XlsxConversions$XlsxWorkbook.writeToExisting(Model2XlsxConversions.scala:421)
	at com.crealytics.spark.excel.ExcelFileSaver.writeToWorkbook$1(ExcelFileSaver.scala:40)
	at com.crealytics.spark.excel.ExcelFileSaver.save(ExcelFileSaver.scala:48)
	at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:60)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
	at sun.reflect.GeneratedMethodAccessor249.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Sample code:


temp_data = [("= >85", "valid sting","valid string 2"),(">=85",""">85%
RACE updation in Veeva""",""">10.7 for HCP
> 8 for HCOs""")]

rdd = sc.parallelize(temp_data)

temp_df = rdd.toDF()

with tempfile.TemporaryDirectory() as directory:
    write_df_to_xlsx_format(temp_df,directory)

Expected Behavior

The above code should generate a xslx.

Possible Solution

Format parsing should be disable inside string.

Steps to Reproduce (for bugs)

Can run above sample code for the same.

Context

We use this plugin to covert spark df to xslx and then send the report as an email to client.

Your Environment

ENV: prod

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:11

github_iconTop GitHub Comments

1reaction
surendra-mtcommented, Jan 6, 2022

Hi @quanghgx sorry for bothering you. We have a pending ticket for this on prod. Let us know, if there is any update on this. Thanks.

1reaction
surendra-mtcommented, Dec 15, 2021

Hi @quanghgx apologies. I thought I have included that in the main body. Here is the function code.

def write_df_to_xlsx_format(df, directory, sheetName="'Sheet1'!A1"):
    print("writing df to temp dir as xlsx file")
    df.coalesce(1).write.format("com.crealytics.spark.excel").option(
        "dataAddress", sheetName
    ).option("header", "true").option("dateFormat", "yy-mm-dd").option(
        "timestampFormat", "yyyy-MM-dd HH:mm"
    ).mode(
        "overwrite"
    ).save(
        directory + "/temp.xlsx"
    )
Read more comments on GitHub >

github_iconTop Results From Across the Web

String parsing formula - Discussions - JMP User Community
Solved: Hello, How can I define formula to import the right srting after the last "_"? For example: Column1 Formula.
Read more >
Google Sheets Query() function returns unexpected character ...
Unable to parse query string for Function QUERY parameter 2: PARSE_ERROR: Encountered " <UNEXPECTED_CHAR> ": "" at line 1, column 9. sample data ......
Read more >
Solved: Parsing a string - Power Platform Community - Microsoft
In the string, the columns (details per each event) are separated by commas, and the different rows (different events) are separated by a...
Read more >
Designer: Parse Error at char(0): Parse Error (Expression #1...
That error means one of your referenced fields is stored as a string. Based on "char(0)", I'd guess it's the first one. You'll...
Read more >
String names with spaces in plyr - Google Groups
Error in parse(text = x) : <text>:1:3: unexpected symbol ... I would like to be able to store the column name in a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found