Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error reading Parquet file

See original GitHub issue

Query:

select internal_id from hive.db.table limit 10;

Output:

Query 20190821_080531_00002_dj2ve failed: com.facebook.presto.spi.type.VarcharType

Same query runs with Hive or Athena perfectly.

Table create:

CREATE TABLE hive.db.table (
   ...
   internal_id varchar,
   ...
)

Parquet schema:

{
  "type": "record",
  "name": "flatschema",
  "namespace": "ns",
  "fields": [ {
    "name": "internal_id",
    "type": ["null", "string"],
    "default": null
  }]
}

Version:

presto-cli --version
Presto CLI 0.220

Issue Analytics

State:
Created 4 years ago
Comments:7

Top GitHub Comments

2reactions

ioah86commented, Aug 23, 2019

@babrar and I have set this one up on all nodes, worker and coordinator. We have actually had the scenario where suddenly a new column appeared which was positioned differently in the parquet compared to the Hive metastore, and there was not an issue.

1reaction

babrarcommented, Aug 21, 2019

There is a Stack Overflow thread for this problem: https://stackoverflow.com/questions/51918860/unable-to-query-parquet-data-with-nested-fields-in-presto-db

Problem: Presto is reading data based on the column indices instead of the column names.

Solution: Add hive.parquet.use-column-names=true to your hive.properties file to force presto to read data using column names instead of column indices.