"java.lang.IllegalStateException: Cannot get a STRING value from a NUMERIC cell" when numeric values in header
See original GitHub issueThis issue is there in closed
issues, no reason given of why it was closed. It’s still not resolved.
This issue is also mentioned here.
Expected Behavior
Ideally
val df = spark.read.format("com.crealytics.spark.excel").option("useHeader", "true").option("inferSchema", "true").load("test.xls")
should read the excel file normally.
Current Behavior
Whenever I try to read the file, this exception occurs:
java.lang.IllegalStateException: Cannot get a STRING value from a NUMERIC cell
at shadeio.poi.hssf.usermodel.HSSFCell.typeMismatch(HSSFCell.java:635)
at shadeio.poi.hssf.usermodel.HSSFCell.getRichStringCellValue(HSSFCell.java:712)
at shadeio.poi.hssf.usermodel.HSSFCell.getStringCellValue(HSSFCell.java:695)
at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1$$anonfun$11.apply(ExcelRelation.scala:149)
at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1$$anonfun$11.apply(ExcelRelation.scala:149)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1.apply(ExcelRelation.scala:149)
at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1.apply(ExcelRelation.scala:147)
at scala.Option.getOrElse(Option.scala:121)
at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:147)
at com.crealytics.spark.excel.ExcelRelation.<init>(ExcelRelation.scala:40)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:40)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:18)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:12)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)
The error got resolved when I manually changed all first-row-cell-values to strings instead of integers.
Possible Solution
Convert non-string column names to string while reading?
Steps to Reproduce (for bugs)
Try reading this excel file.
I tried reading this with .option("inferSchema", "true")
as well as .option("inferSchema", "false")
but the exception kept on happening.
Context
I was just trying to read an excel file using this library.
Your Environment
I am using Zeppelin
notebook to run this. The code above is the first line of my project.
I am using the following versions:
Apache Spark => 2.3
Spark-excel => 0.12.0
from here.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:11
Top GitHub Comments
On my example it worked.
Currently, it is assumed that headers are String cells. To change this, you could extract the two
val
s here intolazy val
s at the class level and change this line tozip
the two values. If you create a PR along with tests, I’d be happy to review 👍