question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Write nested Pocos as Parquet

See original GitHub issue

I’d like to get the bytes from generated parquet Poco-structure using the ChoParquetWriter

byte[] bytes = ChoParquetWriter.SerializeAll<MyData>(data);

The poco structure (IEnumerable<MyData> data as serialized json)

[{
	"Health": {
		"Id": 99,
		"Status": false
	},
	"Safety": {
		"Id": 3,
		"Fire": 1
	},
	"Climate": [{
		"Id": 0,
		"State": 2
	}]
}]

MyData.cs

public class MyData
{
    public Health Health { get; set; }
    public Safety Safety { get; set; }
    public List<Climate> Climate { get; set; }
}

(MyData is actually even more nested but follows the same pattern)

but this gives an error: Parquet: CLR type ‘<redacted>.Climate’ is not supported, please specify one of 'System.DateTimeOffset, System.DateTime, Parquet.File.Values.Primitives.Interval, System.Decimal, System.Boolean, System.Byte, System.SByte, System.Int16, System.UInt16, System.Int32, System.Int64, System.Numerics.BigInteger, System.Single, System.Double, System.String, System.Byte[], , , ’ or use an alternative constructor.

Excluding the Climate property and then it all works fine, what am I missing?

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
bc3techcommented, May 31, 2023

Because Parquet file doesn’t support nested data format. You will need to flatten it before storing them

Confused by this when docs seem to say nested columns are supported

1reaction
Cinchoocommented, Apr 12, 2021

Because Parquet file doesn’t support nested data format. You will need to flatten it before storing them

Here is one way to handle it

using (var r = ChoJSONReader<MyData>.LoadText(json)
    .UseJsonSerialization())
{
    using (var w = new ChoParquetWriter("MyData.parquet")
        )
    {
        w.Write(r.Select(rec1 => rec1.ToDictionary().Flatten().ToDictionary()));
    }
}
Read more comments on GitHub >

github_iconTop Results From Across the Web

Write nested parquet format from Python - json
What is the best way to write the nested parquet file? I have read Nested data in Parquet with Python and here fast...
Read more >
Cinchoo ETL - Parquet Reader
Parquet stores nested data structures in a flat columnar format. Compared to a traditional approach where data is stored in row-oriented ...
Read more >
Nested data representation in Parquet
Parquet stores nested structures thanks to structures called repetition and definition levels. The first one is used to determine when a new ...
Read more >
Arrow and Parquet Part 2: Nested and Hierarchical Data using ...
In our final blog post, we explain how Parquet and Arrow combine these concepts to support arbitrary nesting of potentially nullable data ...
Read more >
Example parquet file - Erohana
Updated on 07/14/2023 Use the PXF HDFS connector to read and write Parquet-format data. ... Parquet is a columnar storage format that supports...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found