Push down filters for SHOW PARTITIONS
See original GitHub issueWe need to pass a Constraint
to Metadata.getLayouts()
in two places:
ShowQueriesRewrite
InformationSchemaPageSourceProvider
This allows the connector to filter for a query like this:
SHOW PARTITIONS FROM orders WHERE orderdate = '2016-02-03'
Issue Analytics
- State:
- Created 7 years ago
- Comments:7 (7 by maintainers)
Top Results From Across the Web
Spark pushdown filter without partition column performance
1.Does spark have to list and scan all the files located in "path" from the source? Yes , as you are not filtering...
Read more >Fast Filtering with Spark PartitionFilters and PushedFilters
Spark doesn't need to push the country filter when working off of partitionedDF because it can use a partition filter that is a...
Read more >Spark: Pushdown Filters / Improve Performance ... - GitHub
Spark: Pushdown Filters / Improve Performance when Importing File Based Tables #3532 ... Use Spark to list all partitions in the table.
Read more >Spark: Understand the Basic of Pushed Filter and Partition ...
Pushed Filter and Partition Filter are techniques that are used by spark to reduce the amount of data that are loaded into memory....
Read more >Run an AWS Glue job on a specific Amazon S3 partition
To filter on partitions in the AWS Glue Data Catalog, use a pushdown predicate. Unlike Filter transforms, pushdown predicates allow you to ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hive partition columns can be of any type, not just strings.
In
ShowQueriesRewrite.visitShowPartitions
, what we really need are just partition columns. Thus instead of callingmetadata.getLayouts
, we can add another methodgetDiscrtePredicateColumns
formetadata
.For the execution of rewriting query, we will change the schema of
__internal_partitions__
to have fixed number partition columns (e.g. 10). So we are not going to support more than 10 partition columns in Hive, but it allows predicate pushdown to avoid fetch all the partitions.