question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

cant read the whole excel file,if there 3 columns at first row,then read only 3 columns other all rows,that is a bug?or not

See original GitHub issue

code

   `SparkConf conf = new SparkConf();
    SparkSession sparkSession = SparkSession.builder()        
    .appName(Test.class).config(conf).enableHiveSupport().getOrCreate();
    JavaSparkContext sc = new JavaSparkContext(sparkSession.sparkContext());
   Dataset<Row> load = new SQLContext(sc).read().format("com.crealytics.spark.excel")
                    .option("useHeader", "false")
                    .option("location","hdfs://rousit/user/shf/50550000000005.xls")
                    .option("inferSchema","false")
                    .option("addColorColumns", "false")
                    .option("treatEmptyValuesAsNulls", "false")
                    .load();
        JavaRDD<Row> rowJavaRDD = load.toJavaRDD();
             rowJavaRDD.foreachPartition(new VoidFunction<Iterator<Row>>() {
            @Override
            public void call(Iterator<Row> rowIterator) throws Exception {
                while (rowIterator.hasNext()){
                    Row row = rowIterator.next();
                    System.out.println(">>>>>>>row"+row);
                }
            }
        });

`

Your Environment

Include as many relevant details about the environment you experienced the bug in

  • Spark version and language spark-version:2.11 java/jdk1.7
  • Spark-Excel version:0.8.3 <dependency> <groupId>com.crealytics</groupId> <artifactId>spark-excel_2.11</artifactId> <version>0.8.3</version> </dependency>
  • Operating System and version, cluster environment, …:

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9

github_iconTop GitHub Comments

1reaction
quanghgxcommented, Jan 15, 2022

Back to the original query, from the title:

cant read the whole excel file,if there 3 columns at first row,then read only 3 columns other all rows,that is a bug?or not

Spark-excel should be able to read only 3 columns for all remaining rows.

  • If the header have only 3 columns, it should work out of the box
  • If the header have more then 3 needed columns, or does not have header at all, spark-excel can accept custom schema (with 3 columns), and this also work

Hi @zochunphy and @wenyangchou, I am going to close this ticket, feel free to reopen it.

1reaction
zochunphycommented, Jul 10, 2019

spark-excel 0.10.2 only support xlsx file? xls file is not supported? https://github.com/crealytics/spark-excel/issues/62#issue-313632113

Read more comments on GitHub >

github_iconTop Results From Across the Web

Excel - Columns Missing but Don't Appear to be Hidden.
I am not able to see columns F and G. There is no double line showing as it would if the columns were...
Read more >
Repeat specific rows or columns on every printed page
When an Excel worksheet spans more than one page, you can print row and column headings (also known as headers or labels) on...
Read more >
Split panes to lock rows or columns in separate worksheet areas
Splitting a worksheet lets you see two regions at the same time in different panes by scrolling in each pane. It freezes panes...
Read more >
Lock or unlock specific areas of a protected worksheet
Lock only specific cells and ranges in a protected worksheet · On the Review tab, click Unprotect Sheet (in the Changes group). Unprotect...
Read more >
Excel cannot complete this task with available resources error ...
The error occurs when you: Open or save an Excel workbook; Open an Excel workbook that references a relative name from another workbook ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found