New Parquet Reader: Error while reading struct with primitive type and complex type
See original GitHub issueWe faced problem with New Parquet Reader. When dealing with complex data in nested structures We got Exceptions:
com.facebook.presto.spi.PrestoException: length of sub blocks differ: block 0: 2, block 1: 1 at com.facebook.presto.hive.parquet.ParquetPageSource.getNextPage(ParquetPageSource.java:225) at com.facebook.presto.hive.HivePageSource.getNextPage(HivePageSource.java:204) at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:262) at com.facebook.presto.operator.Driver.processInternal(Driver.java:303) at com.facebook.presto.operator.Driver.lambda$processFor$6(Driver.java:234) at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:537) at com.facebook.presto.operator.Driver.processFor(Driver.java:229) at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:623) at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162) at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:463) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalArgumentException: length of sub blocks differ: block 0: 2, block 1: 1 at com.facebook.presto.spi.block.InterleavedBlock.<init>(InterleavedBlock.java:48) at com.facebook.presto.hive.parquet.reader.ParquetReader.readStruct(ParquetReader.java:234) at com.facebook.presto.hive.parquet.reader.ParquetReader.readBlock(ParquetReader.java:308) at com.facebook.presto.hive.parquet.reader.ParquetReader.readArray(ParquetReader.java:163) at com.facebook.presto.hive.parquet.reader.ParquetReader.readArray(ParquetReader.java:153) at com.facebook.presto.hive.parquet.ParquetPageSource.getNextPage(ParquetPageSource.java:204) ... 12 more
According to tests exception arises when struct with primitive and complex type is array element.
Several tests with Presto version 0.177:
- Map with primitive types - works well:
presto> describe hive.presto_test.test1;
Column | Type | Extra | Comment
--------+-----------------------+-------+---------
c1 | map(varchar, varchar) | |
presto> set session hive.parquet_optimized_reader_enabled=false;
presto> select * from hive.presto_test.test1;
c1
----------------
{k1=v1, k2=v2}
presto> set session hive.parquet_optimized_reader_enabled=true;
presto> select * from hive.presto_test.test1;
c1
----------------
{k1=v1, k2=v2}
- array of structs with primitive type and map when array has multiple entries - got exception:
presto> describe hive.presto_test.test2;
Column | Type | Extra | Comment
--------+--------------------------------------------------+-------+---------
c1 | array(row(p1 integer, m1 map(varchar, varchar))) | |
presto> set session hive.parquet_optimized_reader_enabled=false;
presto> select * from hive.presto_test.test2;
c1
------------------------------------------
[{p1=1, m1={k1=v1}}, {p1=2, m1={k2=v2}}]
presto> set session hive.parquet_optimized_reader_enabled=true;
presto> select * from hive.presto_test.test2;
Query 20170526_075802_00029_ybb6u failed: length of sub blocks differ: block 0: 2, block 1: 1
- Structs with Primitive Type and Map - works well
presto> describe hive.presto_test.test3;
Column | Type | Extra | Comment
--------+-------------------------------------------+-------+---------
c1 | row(p1 integer, m1 map(varchar, varchar)) | |
presto> set session hive.parquet_optimized_reader_enabled=false;
presto> select * from hive.presto_test.test3;
c1
---------------------------
{p1=1, m1={k1=v1, k2=v2}}
presto> set session hive.parquet_optimized_reader_enabled=true;
presto> select * from hive.presto_test.test3;
c1
---------------------------
{p1=1, m1={k1=v1, k2=v2}}]
- array of structs with primitive type and map when array has single entry - got different exception:
presto> describe hive.presto_test.test4;
Column | Type | Extra | Comment
--------+--------------------------------------------------+-------+---------
c1 | array(row(p1 integer, m1 map(varchar, varchar))) | |
presto> set session hive.parquet_optimized_reader_enabled=false;
presto> select * from hive.presto_test.test4;
c1
----------------------
[{p1=1, m1={k1=v1}}]
presto> set session hive.parquet_optimized_reader_enabled=true;
presto> select * from hive.presto_test.test4;
Query 20170526_081253_00050_ybb6u failed: Invalid position 0 in block with 1 positions
Full Exception:
java.lang.IndexOutOfBoundsException: Invalid position 0 in block with 1 positions at com.facebook.presto.spi.block.AbstractArrayBlock.getRegionSizeInBytes(AbstractArrayBlock.java:97) at com.facebook.presto.spi.block.ArrayBlock.calculateSize(ArrayBlock.java:91) at com.facebook.presto.spi.block.ArrayBlock.getSizeInBytes(ArrayBlock.java:82) at com.facebook.presto.spi.Page.getSizeInBytes(Page.java:66) at com.facebook.presto.operator.OperatorContext.recordGetOutput(OperatorContext.java:180) at com.facebook.presto.operator.Driver.processInternal(Driver.java:304) at com.facebook.presto.operator.Driver.lambda$processFor$6(Driver.java:234) at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:537) at com.facebook.presto.operator.Driver.processFor(Driver.java:229) at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:623) at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162) at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:463) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Issue Analytics
- State:
- Created 6 years ago
- Reactions:6
- Comments:7 (1 by maintainers)
The fix is merged: https://github.com/prestodb/presto/pull/9156 try the latest code, should be fixed
The issue should be fixed. New Parquet Reader is in production since 2018