Support Parquet data format
See original GitHub issueOverview
Parquet integration is a really popular feature request for Frictionless. We want to have integration. At the same time, it’s not been discovered yet so this issue requires a design solution proposal. One idea that it can be implemented using pandas
- https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html
Usually, integration means that we can:
for resource (internally using
Parser
)
# Read
resource = Resource('parquet-file')
resource.read_rows()
# etc
# Write
resource = Resource('table.csv')
resource.write('parquet-file')
Plan
- research what is Parquet format and how it can be mapped to Frictionless primitives (package/resource/schema) ping @roll to sync
- TBD
Issue Analytics
- State:
- Created 3 years ago
- Reactions:17
- Comments:14 (8 by maintainers)
Top Results From Across the Web
What is the Parquet File Format? Use Cases & Benefits
Apache Parquet is a file format designed to support fast data processing for complex data, with several notable characteristics:.
Read more >Apache Parquet
Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression ...
Read more >Demystifying the Parquet File Format - Towards Data Science
In this post we will discuss apache parquet, an extremely efficient and well-supported file format. The post is geared towards data practitioners (ML,...
Read more >Parquet - Databricks
Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and ......
Read more >apache/parquet-format - GitHub
Parquet is a columnar storage format that supports nested data. Parquet metadata is encoded using Apache Thrift. The Parquet-format project contains all ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
It’s been implemented in v5 (#1186) (will be released this month)
I also created a follow-up issue - https://github.com/frictionlessdata/frictionless-py/issues/1203
I think we’d make use of this in @catalyst-cooperative / PUDL for publishing our long tables.