question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add abstraction layer for Parquet files to support multiple reader/writer libraries

See original GitHub issue

The IO implementation in dask.dataframe.io.parquet is coupled to fastparquet. I would like to enable users to elect to use pyarrow.parquet if it is available. This would work by writing:

read_parquet(..., driver='pyarrow')

or

read_parquet(..., driver='fastparquet')

This should also help encourage semantic conformity per discussion in #2113

cc @cpcloud @jreback

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
wesmcommented, Mar 28, 2017

Very good. I’ll start putting together a patch and keep you posted.

0reactions
jcristcommented, Nov 13, 2017

This is now completed, both fastparquet and pyarrow are supported for reading and writing operations as of #2868. Closing.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Does any Python library support writing arrays of structs to ...
The top answer to a question of writing Parquet files list these two libraries (and do mention lacking support for nested data).
Read more >
0.2.0 Release | Apache Arrow
compareSchemas to a public utils class ARROW-415 - C++: Add Equals ... ARROW-361 - Python: Support reading a column-selection from Parquet files ......
Read more >
Library patterns: Multiple levels of abstraction - Tomas Petricek
But sometimes, you actually want to process a file differently - for example, add an automatically generated TOC (table of contents). An ...
Read more >
A curated list of awesome Go frameworks, libraries and ...
A curated list of awesome Go / Golang frameworks, libraries and software. ... celeriac - Library for adding support for interacting and monitoring...
Read more >
Go libraries | Everything I know - My Knowledge Wiki
httpx - Fast and multi-purpose HTTP toolkit allows to run multiple probers using retryablehttp library, it is designed to maintain the result reliability...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found