read partitioned parquet directories
See original GitHub issueHi, can I read partitioned parquet file (which is tree of directories) WITHOUT
metadata file? I get the parquet collection from Spark.
For example:
test.parq
├─date=20150105
├─date=20150106
├─date=20150107
which contains 3 partition.
Thanks.
Issue Analytics
- State:
- Created 7 years ago
- Reactions:7
- Comments:13 (8 by maintainers)
Top Results From Across the Web
Read Parquet Files from Nested Directories - Kontext
Read Parquet Files from Nested Directories ... Spark supports partition discovery to read data that is stored in partitioned directories. For the ...
Read more >Reading DataFrame from partitioned parquet file
sqlContext.read.parquet can take multiple paths as input. If you want just day=5 and day=6 , you can simply add two paths like:
Read more >How to write and read multiple Parquet files - Deephaven
This guide will show you how to read a directory of similar Parquet files into a Deephaven table, supplying just the directory path,...
Read more >parquet file to include partitioned column in file
In my case the parquet file is to be read by external consumers and they expect the coutryCode column in file. Is there...
Read more >Parquet Files - Spark 2.4.0 Documentation
In a partitioned table, data are usually stored in different directories, with partitioning column values encoded in the path of each partition directory....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
If you don’t need an index (and it seems you don’t, or maybe don’t even have a column that is appropriate), you can use
infer_divisions=False
, which should skip gathering metadata from all of the files before constructing the graph. In general, though, the size of each partition will be very important to performance, and you might want to create your data with larger ones, if you have the memory to spare.I haven’t tied this, but you might be able to use
merge
in the directory above the partition, passing the relative paths of all of the parquet files, which then build the metadata file. There is no specific way to read a set of isolated parquet files.