question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

read partitioned parquet directories

See original GitHub issue

Hi, can I read partitioned parquet file (which is tree of directories) WITHOUT metadata file? I get the parquet collection from Spark. For example: test.parq ├─date=20150105 ├─date=20150106 ├─date=20150107 which contains 3 partition. Thanks.

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Reactions:7
  • Comments:13 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
martindurantcommented, Apr 23, 2019

If you don’t need an index (and it seems you don’t, or maybe don’t even have a column that is appropriate), you can use infer_divisions=False, which should skip gathering metadata from all of the files before constructing the graph. In general, though, the size of each partition will be very important to performance, and you might want to create your data with larger ones, if you have the memory to spare.

1reaction
martindurantcommented, Feb 27, 2017

I haven’t tied this, but you might be able to use merge in the directory above the partition, passing the relative paths of all of the parquet files, which then build the metadata file. There is no specific way to read a set of isolated parquet files.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Read Parquet Files from Nested Directories - Kontext
Read Parquet Files from Nested Directories ... Spark supports partition discovery to read data that is stored in partitioned directories. For the ...
Read more >
Reading DataFrame from partitioned parquet file
sqlContext.read.parquet can take multiple paths as input. If you want just day=5 and day=6 , you can simply add two paths like:
Read more >
How to write and read multiple Parquet files - Deephaven
This guide will show you how to read a directory of similar Parquet files into a Deephaven table, supplying just the directory path,...
Read more >
parquet file to include partitioned column in file
In my case the parquet file is to be read by external consumers and they expect the coutryCode column in file. Is there...
Read more >
Parquet Files - Spark 2.4.0 Documentation
In a partitioned table, data are usually stored in different directories, with partitioning column values encoded in the path of each partition directory....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found