Add abstraction layer for Parquet files to support multiple reader/writer libraries
See original GitHub issueThe IO implementation in dask.dataframe.io.parquet
is coupled to fastparquet. I would like to enable users to elect to use pyarrow.parquet
if it is available. This would work by writing:
read_parquet(..., driver='pyarrow')
or
read_parquet(..., driver='fastparquet')
This should also help encourage semantic conformity per discussion in #2113
Issue Analytics
- State:
- Created 6 years ago
- Comments:7 (7 by maintainers)
Top Results From Across the Web
Does any Python library support writing arrays of structs to ...
The top answer to a question of writing Parquet files list these two libraries (and do mention lacking support for nested data).
Read more >0.2.0 Release | Apache Arrow
compareSchemas to a public utils class ARROW-415 - C++: Add Equals ... ARROW-361 - Python: Support reading a column-selection from Parquet files ......
Read more >Library patterns: Multiple levels of abstraction - Tomas Petricek
But sometimes, you actually want to process a file differently - for example, add an automatically generated TOC (table of contents). An ...
Read more >A curated list of awesome Go frameworks, libraries and ...
A curated list of awesome Go / Golang frameworks, libraries and software. ... celeriac - Library for adding support for interacting and monitoring...
Read more >Go libraries | Everything I know - My Knowledge Wiki
httpx - Fast and multi-purpose HTTP toolkit allows to run multiple probers using retryablehttp library, it is designed to maintain the result reliability...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Very good. I’ll start putting together a patch and keep you posted.
This is now completed, both
fastparquet
andpyarrow
are supported for reading and writing operations as of #2868. Closing.