question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

New Parquet Reader: Error while reading struct with primitive type and complex type

See original GitHub issue

We faced problem with New Parquet Reader. When dealing with complex data in nested structures We got Exceptions:

com.facebook.presto.spi.PrestoException: length of sub blocks differ: block 0: 2, block 1: 1 at com.facebook.presto.hive.parquet.ParquetPageSource.getNextPage(ParquetPageSource.java:225) at com.facebook.presto.hive.HivePageSource.getNextPage(HivePageSource.java:204) at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:262) at com.facebook.presto.operator.Driver.processInternal(Driver.java:303) at com.facebook.presto.operator.Driver.lambda$processFor$6(Driver.java:234) at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:537) at com.facebook.presto.operator.Driver.processFor(Driver.java:229) at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:623) at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162) at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:463) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalArgumentException: length of sub blocks differ: block 0: 2, block 1: 1 at com.facebook.presto.spi.block.InterleavedBlock.<init>(InterleavedBlock.java:48) at com.facebook.presto.hive.parquet.reader.ParquetReader.readStruct(ParquetReader.java:234) at com.facebook.presto.hive.parquet.reader.ParquetReader.readBlock(ParquetReader.java:308) at com.facebook.presto.hive.parquet.reader.ParquetReader.readArray(ParquetReader.java:163) at com.facebook.presto.hive.parquet.reader.ParquetReader.readArray(ParquetReader.java:153) at com.facebook.presto.hive.parquet.ParquetPageSource.getNextPage(ParquetPageSource.java:204) ... 12 more

According to tests exception arises when struct with primitive and complex type is array element.

Several tests with Presto version 0.177:

  1. Map with primitive types - works well:

presto> describe hive.presto_test.test1; Column | Type | Extra | Comment --------+-----------------------+-------+--------- c1 | map(varchar, varchar) | |

presto> set session hive.parquet_optimized_reader_enabled=false; presto> select * from hive.presto_test.test1; c1 ---------------- {k1=v1, k2=v2}

presto> set session hive.parquet_optimized_reader_enabled=true; presto> select * from hive.presto_test.test1; c1 ---------------- {k1=v1, k2=v2}

  1. array of structs with primitive type and map when array has multiple entries - got exception:

presto> describe hive.presto_test.test2; Column | Type | Extra | Comment --------+--------------------------------------------------+-------+--------- c1 | array(row(p1 integer, m1 map(varchar, varchar))) | |

presto> set session hive.parquet_optimized_reader_enabled=false; presto> select * from hive.presto_test.test2; c1 ------------------------------------------ [{p1=1, m1={k1=v1}}, {p1=2, m1={k2=v2}}]

presto> set session hive.parquet_optimized_reader_enabled=true; presto> select * from hive.presto_test.test2; Query 20170526_075802_00029_ybb6u failed: length of sub blocks differ: block 0: 2, block 1: 1

  1. Structs with Primitive Type and Map - works well

presto> describe hive.presto_test.test3; Column | Type | Extra | Comment --------+-------------------------------------------+-------+--------- c1 | row(p1 integer, m1 map(varchar, varchar)) | |

presto> set session hive.parquet_optimized_reader_enabled=false; presto> select * from hive.presto_test.test3; c1 --------------------------- {p1=1, m1={k1=v1, k2=v2}}

presto> set session hive.parquet_optimized_reader_enabled=true; presto> select * from hive.presto_test.test3; c1 --------------------------- {p1=1, m1={k1=v1, k2=v2}}]

  1. array of structs with primitive type and map when array has single entry - got different exception:

presto> describe hive.presto_test.test4; Column | Type | Extra | Comment --------+--------------------------------------------------+-------+--------- c1 | array(row(p1 integer, m1 map(varchar, varchar))) | |

presto> set session hive.parquet_optimized_reader_enabled=false; presto> select * from hive.presto_test.test4; c1 ---------------------- [{p1=1, m1={k1=v1}}]

presto> set session hive.parquet_optimized_reader_enabled=true; presto> select * from hive.presto_test.test4; Query 20170526_081253_00050_ybb6u failed: Invalid position 0 in block with 1 positions

Full Exception: java.lang.IndexOutOfBoundsException: Invalid position 0 in block with 1 positions at com.facebook.presto.spi.block.AbstractArrayBlock.getRegionSizeInBytes(AbstractArrayBlock.java:97) at com.facebook.presto.spi.block.ArrayBlock.calculateSize(ArrayBlock.java:91) at com.facebook.presto.spi.block.ArrayBlock.getSizeInBytes(ArrayBlock.java:82) at com.facebook.presto.spi.Page.getSizeInBytes(Page.java:66) at com.facebook.presto.operator.OperatorContext.recordGetOutput(OperatorContext.java:180) at com.facebook.presto.operator.Driver.processInternal(Driver.java:304) at com.facebook.presto.operator.Driver.lambda$processFor$6(Driver.java:234) at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:537) at com.facebook.presto.operator.Driver.processFor(Driver.java:229) at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:623) at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162) at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:463) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:6
  • Comments:7 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
zhenxiaocommented, May 26, 2018

The fix is merged: https://github.com/prestodb/presto/pull/9156 try the latest code, should be fixed

0reactions
zhenxiaocommented, May 26, 2020

The issue should be fixed. New Parquet Reader is in production since 2018

Read more comments on GitHub >

github_iconTop Results From Across the Web

Unable to read Parquet files(Created from Databricks Job ...
This error mainly happens because of unsupported data type. When you pass to the column in parquet file make sure you are using...
Read more >
Apache Spark job fails with Parquet column cannot be ...
Problem. You are reading data in Parquet format and writing to a Delta table when you get a Parquet column cannot be converted...
Read more >
Working with Complex Data Formats with Structured ...
Let's now dive into a quick overview of how we can go from complex data types to primitive data types and vice-a-versa.
Read more >
Complex Types (Impala 2.3 or higher only)
Impala can query Parquet and ORC tables containing ARRAY , STRUCT , and MAP columns produced by Hive. There are some differences to...
Read more >
Reading Structs as Expanded Columns - Vertica
For Parquet files, see Reading Complex Types from Parquet Files. ... To read this struct, you define columns for the street address, city,...
Read more >

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found