Hive Connector: Ignore empty files when scanning files
See original GitHub issueCurrently if the table(external table) are created as format e.g. Parquet, and there is an empty file(zero length) in the directory, an error will occur:
oss://xxxxxx/xxxx/empty_file is not a valid Parquet File
For the ease of use, we can safely skip all the empty files to avoid this error, how do you guys think? (The empty file use case will not occur if the files are managed by Hive, but could occur if the files are uploaded by user or any other program, when using Object Store Services).
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (7 by maintainers)
Top Results From Across the Web
Hive connector — Trino 403 Documentation
Ignore partitions when the file system location does not exist rather than failing the query. This skips data that may be expected to...
Read more >Hive Connector — Presto 0.278 Documentation
The Hive connector allows querying data stored in a Hive data warehouse. Hive is a combination of three components: Data files in varying...
Read more >How to ignore empty parquet files when reading using Hive
Try to use the property $file_size. If it is more than 0 then process the data load. It would be better if you...
Read more >E-MapReduce:Hive connector - Alibaba Cloud
Specifies whether to ignore a partition rather than report a query failure if the system file path specified for the partition does not...
Read more >Reading and writing Hive tables in R | CDP Public Cloud
The Hive Warehouse Connector (HWC) supports reads and writes to Apache Hive managed ACID tables in R. Cloudera provides an R package SparklyrHWC...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
hi @xumingming @mbasmanova I feel if a Hive connector table with metadata specified as Parquet format, all files should be Parquet format. empty files seems invalid cases. Am not convinced to have ParquetReader skip non-parquet-empty files. What do you think?
OK, sounds good to me 😃