Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

read partitioned parquet directories

See original GitHub issue

Hi, can I read partitioned parquet file (which is tree of directories) WITHOUT metadata file? I get the parquet collection from Spark. For example: test.parq ├─date=20150105 ├─date=20150106 ├─date=20150107 which contains 3 partition. Thanks.

Issue Analytics

State:
Created 7 years ago
Reactions:7
Comments:13 (8 by maintainers)

Top GitHub Comments

1reaction

martindurantcommented, Apr 23, 2019

If you don’t need an index (and it seems you don’t, or maybe don’t even have a column that is appropriate), you can use infer_divisions=False, which should skip gathering metadata from all of the files before constructing the graph. In general, though, the size of each partition will be very important to performance, and you might want to create your data with larger ones, if you have the memory to spare.

1reaction

martindurantcommented, Feb 27, 2017

I haven’t tied this, but you might be able to use merge in the directory above the partition, passing the relative paths of all of the parquet files, which then build the metadata file. There is no specific way to read a set of isolated parquet files.

Top Results From Across the Web

Read Parquet Files from Nested Directories - Kontext

Read Parquet Files from Nested Directories ... Spark supports partition discovery to read data that is stored in partitioned directories. For the ...

Reading DataFrame from partitioned parquet file

sqlContext.read.parquet can take multiple paths as input. If you want just day=5 and day=6 , you can simply add two paths like:

How to write and read multiple Parquet files - Deephaven

This guide will show you how to read a directory of similar Parquet files into a Deephaven table, supplying just the directory path,...

parquet file to include partitioned column in file

In my case the parquet file is to be read by external consumers and they expect the coutryCode column in file. Is there...

Parquet Files - Spark 2.4.0 Documentation

In a partitioned table, data are usually stored in different directories, with partitioning column values encoded in the path of each partition directory....