question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Read Parquet file with `fastparquet` instead of Arrow

See original GitHub issue

Hello Modin team,

I’m trying to read Parquet file using

modin.pandas.read_parquet("example.parquet", engine="fastparquet")

On my environment, only fastparquet is install. The read_parquet will raise:

ImportError: Missing optional dependency ‘pyarrow’. pyarrow is required to read parquet files.

However, if I remove Line 606-609 from parquet_dispatcher.py, the read function works fine since I believe Modin support both fastparquet and pyarrow.

Can we also check for fastparquet here? Or is there any specific reason to not do so?

Thank you in advance for answering my question,

Issue Analytics

  • State:closed
  • Created 10 months ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
mvashishthacommented, Dec 1, 2022

@trgiangdo the next planned release is 0.18.0, scheduled for December 7. After #5285 is merged and before the next release comes out, you can install from master or from a specific commit to get your fix.

1reaction
pyritocommented, Nov 30, 2022

We could merge in the fix to master and you could work off the latest master if that would work for your codebase? I also think the next minor release should be coming soon.

Read more comments on GitHub >

github_iconTop Results From Across the Web

fastparquet — fastparquet 0.7.1 documentation - Read the Docs
This package aims to provide a performant library to read and write Parquet files from Python, without any need for a Python-Java bridge....
Read more >
Reading and Writing the Apache Parquet Format
Apache Arrow is an ideal in-memory transport layer for data that is being read or written with Parquet files. We have been concurrently...
Read more >
How fast is reading Parquet file (with Arrow) vs. CSV with ...
A focused study on the speed comparison of reading parquet files using PyArrow vs. reading identical CSV files with Pandas.
Read more >
python - A comparison between fastparquet and pyarrow?
I used both fastparquet and pyarrow for converting protobuf data to parquet and to query the same in S3 using Athena.
Read more >
fastparquet Documentation - Read the Docs
1. read and write Parquet files, in single or multiple-file format ... be successfully recreated only in fastparquet and (py)arrow.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found