Byte array column formatted in V4.2.3 or later causes read error in ParquetViewer
See original GitHub issueLibrary Version: … 4.2.3 and later. Works fine with 4.3.2 and earlier.
.NET Version: … .Net 7
OS: … Windows 11
Expected Behaviour
Parquet file containing one or more byte array columns should be readable by utilities such as ParquetViewer …
Actual Behaviour
ParquetViewer 2.4.2.0 shows an error dialog with message “cannot find data type handler to create model schema for [n:mask, t:FIXED_LEN_BYTE_ARRAY, ct: <not set>, rt: OPTIONAL, c:0]”
I have included the steps I use to format the file, in case I am using the library incorrectly. It is possible of course that there is a bug in the parsing done by Parquet Viewer, but I have no way of determining which component is at fault - I only know there has been a regression in compatibility.
This seems like an error that lots of people would notice. Let me know if you think ParquetViewer is at fault, and I can try and find some other way of checking my files. It looks like ParquetViewer uses Parquet.Net though. Thanks! …
Steps to Reproduce
Create a file that contains a column created like this: var myData = new List<byte[]>();
then for each row: byte[] myBytes = new byte[some length]; myMaskData.Add(myBytes); …
Then to format the file:
Code Snippet
var myColumn = new DataColumn(new DataField<byte[]>("colname"), myData.ToArray());
var mySchema = new ParquetSchema(myColumn);
using (ParquetWriter myParquetWriter = await ParquetWriter.CreateAsync(mySchema, inRoiParquetStream, append: false).ConfigureAwait(false))
{
using (ParquetRowGroupWriter myGroupWriter = myParquetWriter.CreateRowGroup())
{
await myGroupWriter.WriteColumnAsync(myColumn).ConfigureAwait(false);
}
}
Issue Analytics
- State:
- Created 8 months ago
- Comments:6 (3 by maintainers)
Yeah that’s due to how browser handles file streams, once I migrate to native file access API that should be comparable to desktop speed.
Thanks! I appreciate your prompt response. I took a look at https://parquetdbg.aloneguid.uk/ - works great for most of my parquet formats (except for the ‘todo: array’ bit), but seems to hang on one larger (12MB) file (with a column with large byte[] type). But I’m sure you know that. Anyway, thanks for all the effort, and your online tool will be great.