Parquet.ParquetException: 'NewField' does not exist in this file (Schema evolution with ParquetConvert.Deserialize)
See original GitHub issueVersion: 3.9.0
Runtime Version: .Net Core v 2.2
OS: Windows
Expected behavior
I am having an issue with schema evolution. Added a new field in my type and it is not able to deserialize now. Can we mark a field optional somhow?
Actual behavior
Parquet.ParquetException: ‘NewField’ does not exist in this file
Steps to reproduce the behavior
- Serialize a collection of certain type.
- Add a field to the type.
- Deserialize using ParquentConvert.Deserialize<T>(“…”);
Code snippet reproducing the behavior
using (Stream fileStream = System.IO.File.OpenRead(“C:\temp\parquet\data.parquet”)) { positions = ParquetConvert.Deserialize<MyType>(fileStream); }
//here
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (2 by maintainers)
Top Results From Across the Web
Schema evolution in parquet format - apache spark
Parquet schema evolution is implementation-dependent. Hive for example has a knob parquet. column.
Read more >Process parquet files in azure function - Microsoft Q&A
But I am not able to serialize parquet file content. I tried using NuGet package- Parquet.Net, below is the code used.
Read more >How to Read Parquet with Spark: Handling Unsupported ...
Reading Parquet files with unsupported types in Spark can be challenging, but it's not impossible. With strategies like schema evolution, custom ...
Read more >[#ARROW-9942] [Python] Schema Evolution - Add new Field
However when adding a new field in a later parquet file, the schemas don't seem to be merged and the new field is...
Read more >Why you should use a parquet format file | by Park Sehun
Because the field type has been changed, Parquet cannot read and write data to the file using the new schema without rewriting the...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Unlikely as I don’t need it personally. Although PRs are always welcome.
I would also love to have such an attribute like [ParquetOptional].
By the way, just to mimic JSON deserialization behavior - I think that not throwing an error and just applying default value if the field which is declared in class does not exist in the file - would be much more expected behavior.
Or are there any limitations to doing that?