Error reading Parquet file
See original GitHub issueQuery:
select internal_id from hive.db.table limit 10;
Output:
Query 20190821_080531_00002_dj2ve failed: com.facebook.presto.spi.type.VarcharType
Same query runs with Hive or Athena perfectly.
Table create:
CREATE TABLE hive.db.table (
...
internal_id varchar,
...
)
Parquet schema:
{
"type": "record",
"name": "flatschema",
"namespace": "ns",
"fields": [ {
"name": "internal_id",
"type": ["null", "string"],
"default": null
}]
}
Version:
presto-cli --version
Presto CLI 0.220
Issue Analytics
- State:
- Created 4 years ago
- Comments:7
Top Results From Across the Web
Error reading from parquet file that is being updated
My suspicion is that the way that the read is happening isn't atomic. Like maybe we are replacing the parquet file while the...
Read more >Read parquet file error - MATLAB Answers - MathWorks
I'm reading parquet files and facing some problems. For comparison the file was read with python using fastparquet with no errors.
Read more >Apache Spark job fails with Parquet column cannot be ...
Problem You are reading data in Parquet format and writing to a Delta table when you get a Parquet column cannot be converted...
Read more >Error reading data from parquet file ( I am pretty new to Pig)
Solved: Pig Stack Trace --------------- ERROR 1200: can't convert optional int96 uploadTime Failed to parse: - 103052.
Read more >Troubleshooting Reads from ORC and Parquet Files - Vertica
This behavior is specific to Parquet files; with an ORC file the type is correctly reported as STRING. The problem occurs because Parquet...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@babrar and I have set this one up on all nodes, worker and coordinator. We have actually had the scenario where suddenly a new column appeared which was positioned differently in the parquet compared to the Hive metastore, and there was not an issue.
There is a Stack Overflow thread for this problem: https://stackoverflow.com/questions/51918860/unable-to-query-parquet-data-with-nested-fields-in-presto-db
Problem: Presto is reading data based on the column indices instead of the column names.
Solution: Add
hive.parquet.use-column-names=true
to your hive.properties file to force presto to read data using column names instead of column indices.