Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support Parquet data format

See original GitHub issue

Overview

Parquet integration is a really popular feature request for Frictionless. We want to have integration. At the same time, it’s not been discovered yet so this issue requires a design solution proposal. One idea that it can be implemented using pandas - https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html

Usually, integration means that we can:

for resource (internally using Parser)

# Read
resource = Resource('parquet-file')
resource.read_rows()
# etc

# Write
resource = Resource('table.csv')
resource.write('parquet-file')

Plan

research what is Parquet format and how it can be mapped to Frictionless primitives (package/resource/schema) ping @roll to sync
TBD

Issue Analytics

State:
Created 3 years ago
Reactions:17
Comments:14 (8 by maintainers)

Top GitHub Comments

3reactions

rollcommented, Jul 22, 2022

It’s been implemented in v5 (#1186) (will be released this month)

I also created a follow-up issue - https://github.com/frictionlessdata/frictionless-py/issues/1203

3reactions

zaneselvanscommented, Oct 21, 2020

I think we’d make use of this in @catalyst-cooperative / PUDL for publishing our long tables.

Top Results From Across the Web

What is the Parquet File Format? Use Cases & Benefits

Apache Parquet is a file format designed to support fast data processing for complex data, with several notable characteristics:.

Apache Parquet

Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression ...

Demystifying the Parquet File Format - Towards Data Science

In this post we will discuss apache parquet, an extremely efficient and well-supported file format. The post is geared towards data practitioners (ML,...

Parquet - Databricks

Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and ......

apache/parquet-format - GitHub

Parquet is a columnar storage format that supports nested data. Parquet metadata is encoded using Apache Thrift. The Parquet-format project contains all ...