question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG]: 4.6.4 returns int arrays as all nulls instead of value

See original GitHub issue

Library Version

4.6.1

OS

Mac OS

OS Architecture

ARM 64

How to reproduce?

I have code that works in 4.5.4 that reads in an array of int’s from a parquet file. However in 4.6.1 I just get null’s back.

In 4.5.4 I read in the values with the following command and it works fine.

DataColumn volumes = await rowGroupReader.ReadColumnAsync(new DataField<int>("volumes.array"));

In 4.6.1 I get an error saying the field is not attached to any schema. So I updated the code to find the DataField from the reader via.

ParquetSchema schema = reader.Schema;
DataField[] dataFields = schema.GetDataFields();
DataField volumesField = dataFields.First(field => field is { Name: "array", Path.FirstPart: "volumes" });

and use that DataField instead. That fixes the schema error, however the values I get back are all null instead of the actual int’s.

I debugged it a little bit and I got to this call stack

DataField.UnpackDefinitions()at .../DataField.cs:line 143
new DataColumn()at .../DataColumn.cs:line 55
PackedColumn.Unpack()
DataColumnReader.<ReadAsync>d__8.MoveNext()
AsyncMethodBuilderCore.Start<Parquet.File.DataColumnReader.<ReadAsync>d__8>()
AsyncTaskMethodBuilder<DataColumn>.Start<Parquet.File.DataColumnReader.<ReadAsync>d__8>()
DataColumnReader.ReadAsync()
ParquetRowGroupReader.ReadColumnAsync()

It appears that on the line with the red arrow, the Data field is set correctly to the read integers.

image

However after running the green highlighted line that unpacks the definitions, all the int’s get replaced with int? that are all null.

Failing test

No response

Issue Analytics

  • State:closed
  • Created 6 months ago
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
aloneguidcommented, Apr 25, 2023

Awesome! I appreciate your feedback and your keen eye for spotting the bug. You know what would make me even happier? If you give this project a star on GitHub, it would really boost my morale and motivation. Come on, you know you want to. 😉

1reaction
aloneguidcommented, Mar 29, 2023

@deanro thanks, that explains it. 4.6 has fixed schema bugs but as a side effect broke compatibility with legacy lists. The file attached is created in the following format: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#backward-compatibility-rules, specifically:

// List<Integer> (nullable list, non-null elements)
optional group my_list (LIST) {
  repeated int32 element;
}

This is an odd way to represent lists which is not used anymore. I will fix in the next update.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to fix array returning null in one method when values ...
You can fix this by either: create the array outside of buy and pass it to this method. return the array created in...
Read more >
Why does java.util.ArrayList allow to add null?
Null may be a valid value for an element of a list. ... "[ArrayList is a] resizable-array implementation of the List interface.
Read more >
Is there a null character too in an integer array, and how do ...
Integer arrays are not C-style strings, and there is no null-terminator values in them (otherwise the array would end at the first zero)....
Read more >
Chapter 4. Types, Values, and Variables
The values of a reference type are references to objects. All objects, including arrays, support the methods of class Object (§4.3.2).
Read more >
Cannot read the array length because value is null
The code doesn't even have a value variable or similar. Could anyone kindly give me some hint/clue so I can resolve this error...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found