[BUG]: 4.6.4 returns int arrays as all nulls instead of value
See original GitHub issueLibrary Version
4.6.1
OS
Mac OS
OS Architecture
ARM 64
How to reproduce?
I have code that works in 4.5.4 that reads in an array of int’s from a parquet file. However in 4.6.1 I just get null’s back.
In 4.5.4 I read in the values with the following command and it works fine.
DataColumn volumes = await rowGroupReader.ReadColumnAsync(new DataField<int>("volumes.array"));
In 4.6.1 I get an error saying the field is not attached to any schema. So I updated the code to find the DataField from the reader via.
ParquetSchema schema = reader.Schema;
DataField[] dataFields = schema.GetDataFields();
DataField volumesField = dataFields.First(field => field is { Name: "array", Path.FirstPart: "volumes" });
and use that DataField instead. That fixes the schema error, however the values I get back are all null instead of the actual int’s.
I debugged it a little bit and I got to this call stack
DataField.UnpackDefinitions()at .../DataField.cs:line 143
new DataColumn()at .../DataColumn.cs:line 55
PackedColumn.Unpack()
DataColumnReader.<ReadAsync>d__8.MoveNext()
AsyncMethodBuilderCore.Start<Parquet.File.DataColumnReader.<ReadAsync>d__8>()
AsyncTaskMethodBuilder<DataColumn>.Start<Parquet.File.DataColumnReader.<ReadAsync>d__8>()
DataColumnReader.ReadAsync()
ParquetRowGroupReader.ReadColumnAsync()
It appears that on the line with the red arrow, the Data field is set correctly to the read integers.

However after running the green highlighted line that unpacks the definitions, all the int’s get replaced with int? that are all null.
Failing test
No response
Issue Analytics
- State:
- Created 6 months ago
- Comments:9 (5 by maintainers)
Awesome! I appreciate your feedback and your keen eye for spotting the bug. You know what would make me even happier? If you give this project a star on GitHub, it would really boost my morale and motivation. Come on, you know you want to. 😉
@deanro thanks, that explains it. 4.6 has fixed schema bugs but as a side effect broke compatibility with legacy lists. The file attached is created in the following format: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#backward-compatibility-rules, specifically:
This is an odd way to represent lists which is not used anymore. I will fix in the next update.