Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"java.lang.IllegalStateException: Cannot get a STRING value from a NUMERIC cell" when numeric values in header

See original GitHub issue

This issue is there in closed issues, no reason given of why it was closed. It’s still not resolved. This issue is also mentioned here.

Expected Behavior

Ideally

val df = spark.read.format("com.crealytics.spark.excel").option("useHeader", "true").option("inferSchema", "true").load("test.xls")

should read the excel file normally.

Current Behavior

Whenever I try to read the file, this exception occurs:

java.lang.IllegalStateException: Cannot get a STRING value from a NUMERIC cell
  at shadeio.poi.hssf.usermodel.HSSFCell.typeMismatch(HSSFCell.java:635)
  at shadeio.poi.hssf.usermodel.HSSFCell.getRichStringCellValue(HSSFCell.java:712)
  at shadeio.poi.hssf.usermodel.HSSFCell.getStringCellValue(HSSFCell.java:695)
  at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1$$anonfun$11.apply(ExcelRelation.scala:149)
  at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1$$anonfun$11.apply(ExcelRelation.scala:149)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.Iterator$class.foreach(Iterator.scala:891)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
  at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
  at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
  at scala.collection.AbstractTraversable.map(Traversable.scala:104)
  at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1.apply(ExcelRelation.scala:149)
  at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1.apply(ExcelRelation.scala:147)
  at scala.Option.getOrElse(Option.scala:121)
  at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:147)
  at com.crealytics.spark.excel.ExcelRelation.<init>(ExcelRelation.scala:40)
  at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:40)
  at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:18)
  at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:12)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340)
  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)

The error got resolved when I manually changed all first-row-cell-values to strings instead of integers.

Possible Solution

Convert non-string column names to string while reading?

Steps to Reproduce (for bugs)

Try reading this excel file. I tried reading this with .option("inferSchema", "true") as well as .option("inferSchema", "false") but the exception kept on happening.

Context

I was just trying to read an excel file using this library.

Your Environment

I am using Zeppelin notebook to run this. The code above is the first line of my project. I am using the following versions: Apache Spark => 2.3 Spark-excel => 0.12.0 from here.

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:11

Top GitHub Comments

2reactions

EnverOsmanovcommented, Oct 29, 2019

On my example it worked.

2reactions

nightscapecommented, Aug 5, 2019

Currently, it is assumed that headers are String cells. To change this, you could extract the two vals here into lazy vals at the class level and change this line to zip the two values. If you create a PR along with tests, I’d be happy to review 👍