Unexpected formula parsing on string columns
See original GitHub issueHi Team,
We are facing an issue while writing pyspark df to xslx. One of our column (string type) has some string like this: = >85
. When converting the df to xlsx, it is breaking because it tries to parse this formula resulting in error.
shadeio.poi.ss.formula.FormulaParseException: Parse error near char 1 '>' in specified formula ' >85'. Expected cell ref or constant literal
at shadeio.poi.ss.formula.FormulaParser.expected(FormulaParser.java:269)
at shadeio.poi.ss.formula.FormulaParser.parseSimpleFactor(FormulaParser.java:1553)
at shadeio.poi.ss.formula.FormulaParser.percentFactor(FormulaParser.java:1506)
at shadeio.poi.ss.formula.FormulaParser.powerFactor(FormulaParser.java:1493)
at shadeio.poi.ss.formula.FormulaParser.Term(FormulaParser.java:1867)
at shadeio.poi.ss.formula.FormulaParser.additiveExpression(FormulaParser.java:1994)
at shadeio.poi.ss.formula.FormulaParser.concatExpression(FormulaParser.java:1978)
at shadeio.poi.ss.formula.FormulaParser.comparisonExpression(FormulaParser.java:1935)
at shadeio.poi.ss.formula.FormulaParser.intersectionExpression(FormulaParser.java:1908)
at shadeio.poi.ss.formula.FormulaParser.unionExpression(FormulaParser.java:1889)
at shadeio.poi.ss.formula.FormulaParser.parse(FormulaParser.java:2036)
at shadeio.poi.ss.formula.FormulaParser.parse(FormulaParser.java:170)
at shadeio.poi.xssf.usermodel.XSSFCell.setFormula(XSSFCell.java:550)
at shadeio.poi.xssf.usermodel.XSSFCell.setCellFormulaImpl(XSSFCell.java:526)
at shadeio.poi.ss.usermodel.CellBase.setCellFormula(CellBase.java:132)
at shadeio.spoiwo.natures.xlsx.Model2XlsxConversions$.convertCell(Model2XlsxConversions.scala:49)
at shadeio.spoiwo.natures.xlsx.Model2XlsxConversions$$anonfun$convertRow$4.apply(Model2XlsxConversions.scala:143)
at shadeio.spoiwo.natures.xlsx.Model2XlsxConversions$$anonfun$convertRow$4.apply(Model2XlsxConversions.scala:143)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at shadeio.spoiwo.natures.xlsx.Model2XlsxConversions$.convertRow(Model2XlsxConversions.scala:143)
at shadeio.spoiwo.natures.xlsx.Model2XlsxConversions$$anonfun$writeToExistingSheet$1.apply(Model2XlsxConversions.scala:156)
at shadeio.spoiwo.natures.xlsx.Model2XlsxConversions$$anonfun$writeToExistingSheet$1.apply(Model2XlsxConversions.scala:156)
at scala.collection.immutable.List.foreach(List.scala:392)
at shadeio.spoiwo.natures.xlsx.Model2XlsxConversions$.writeToExistingSheet(Model2XlsxConversions.scala:156)
at shadeio.spoiwo.natures.xlsx.Model2XlsxConversions$$anonfun$writeToExistingWorkbook$4.apply(Model2XlsxConversions.scala:324)
at shadeio.spoiwo.natures.xlsx.Model2XlsxConversions$$anonfun$writeToExistingWorkbook$4.apply(Model2XlsxConversions.scala:321)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at shadeio.spoiwo.natures.xlsx.Model2XlsxConversions$.writeToExistingWorkbook(Model2XlsxConversions.scala:321)
at shadeio.spoiwo.natures.xlsx.Model2XlsxConversions$XlsxWorkbook.writeToExisting(Model2XlsxConversions.scala:421)
at com.crealytics.spark.excel.ExcelFileSaver.writeToWorkbook$1(ExcelFileSaver.scala:40)
at com.crealytics.spark.excel.ExcelFileSaver.save(ExcelFileSaver.scala:48)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:60)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
at sun.reflect.GeneratedMethodAccessor249.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Sample code:
temp_data = [("= >85", "valid sting","valid string 2"),(">=85",""">85%
RACE updation in Veeva""",""">10.7 for HCP
> 8 for HCOs""")]
rdd = sc.parallelize(temp_data)
temp_df = rdd.toDF()
with tempfile.TemporaryDirectory() as directory:
write_df_to_xlsx_format(temp_df,directory)
Expected Behavior
The above code should generate a xslx.
Possible Solution
Format parsing should be disable inside string.
Steps to Reproduce (for bugs)
Can run above sample code for the same.
Context
We use this plugin to covert spark df to xslx and then send the report as an email to client.
Your Environment
ENV: prod
Issue Analytics
- State:
- Created 2 years ago
- Comments:11
Top Results From Across the Web
String parsing formula - Discussions - JMP User Community
Solved: Hello, How can I define formula to import the right srting after the last "_"? For example: Column1 Formula.
Read more >Google Sheets Query() function returns unexpected character ...
Unable to parse query string for Function QUERY parameter 2: PARSE_ERROR: Encountered " <UNEXPECTED_CHAR> ": "" at line 1, column 9. sample data ......
Read more >Solved: Parsing a string - Power Platform Community - Microsoft
In the string, the columns (details per each event) are separated by commas, and the different rows (different events) are separated by a...
Read more >Designer: Parse Error at char(0): Parse Error (Expression #1...
That error means one of your referenced fields is stored as a string. Based on "char(0)", I'd guess it's the first one. You'll...
Read more >String names with spaces in plyr - Google Groups
Error in parse(text = x) : <text>:1:3: unexpected symbol ... I would like to be able to store the column name in a...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @quanghgx sorry for bothering you. We have a pending ticket for this on prod. Let us know, if there is any update on this. Thanks.
Hi @quanghgx apologies. I thought I have included that in the main body. Here is the function code.