Read Parquet file with `fastparquet` instead of Arrow
See original GitHub issueHello Modin team,
I’m trying to read Parquet file using
modin.pandas.read_parquet("example.parquet", engine="fastparquet")
On my environment, only fastparquet
is install.
The read_parquet
will raise:
ImportError: Missing optional dependency ‘pyarrow’. pyarrow is required to read parquet files.
However, if I remove Line 606-609 from parquet_dispatcher.py, the read function works fine since I believe Modin support both fastparquet
and pyarrow
.
Can we also check for fastparquet
here?
Or is there any specific reason to not do so?
Thank you in advance for answering my question,
Issue Analytics
- State:
- Created 10 months ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
fastparquet — fastparquet 0.7.1 documentation - Read the Docs
This package aims to provide a performant library to read and write Parquet files from Python, without any need for a Python-Java bridge....
Read more >Reading and Writing the Apache Parquet Format
Apache Arrow is an ideal in-memory transport layer for data that is being read or written with Parquet files. We have been concurrently...
Read more >How fast is reading Parquet file (with Arrow) vs. CSV with ...
A focused study on the speed comparison of reading parquet files using PyArrow vs. reading identical CSV files with Pandas.
Read more >python - A comparison between fastparquet and pyarrow?
I used both fastparquet and pyarrow for converting protobuf data to parquet and to query the same in S3 using Athena.
Read more >fastparquet Documentation - Read the Docs
1. read and write Parquet files, in single or multiple-file format ... be successfully recreated only in fastparquet and (py)arrow.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@trgiangdo the next planned release is 0.18.0, scheduled for December 7. After #5285 is merged and before the next release comes out, you can install from master or from a specific commit to get your fix.
We could merge in the fix to master and you could work off the latest master if that would work for your codebase? I also think the next minor release should be coming soon.