question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support Parquet data format

See original GitHub issue

Overview

Parquet integration is a really popular feature request for Frictionless. We want to have integration. At the same time, it’s not been discovered yet so this issue requires a design solution proposal. One idea that it can be implemented using pandas - https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html

Usually, integration means that we can:

for resource (internally using Parser)

# Read
resource = Resource('parquet-file')
resource.read_rows()
# etc

# Write
resource = Resource('table.csv')
resource.write('parquet-file')

Plan

  • research what is Parquet format and how it can be mapped to Frictionless primitives (package/resource/schema) ping @roll to sync
  • TBD

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:17
  • Comments:14 (8 by maintainers)

github_iconTop GitHub Comments

3reactions
rollcommented, Jul 22, 2022

It’s been implemented in v5 (#1186) (will be released this month)

I also created a follow-up issue - https://github.com/frictionlessdata/frictionless-py/issues/1203

3reactions
zaneselvanscommented, Oct 21, 2020

I think we’d make use of this in @catalyst-cooperative / PUDL for publishing our long tables.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What is the Parquet File Format? Use Cases & Benefits
Apache Parquet is a file format designed to support fast data processing for complex data, with several notable characteristics:.
Read more >
Apache Parquet
Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression ...
Read more >
Demystifying the Parquet File Format - Towards Data Science
In this post we will discuss apache parquet, an extremely efficient and well-supported file format. The post is geared towards data practitioners (ML,...
Read more >
Parquet - Databricks
Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and ......
Read more >
apache/parquet-format - GitHub
Parquet is a columnar storage format that supports nested data. Parquet metadata is encoded using Apache Thrift. The Parquet-format project contains all ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found