question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error reading Parquet file

See original GitHub issue

Query:

select internal_id from hive.db.table limit 10;

Output:

Query 20190821_080531_00002_dj2ve failed: com.facebook.presto.spi.type.VarcharType

Same query runs with Hive or Athena perfectly.

Table create:

CREATE TABLE hive.db.table (
   ...
   internal_id varchar,
   ...
)

Parquet schema:

{
  "type": "record",
  "name": "flatschema",
  "namespace": "ns",
  "fields": [ {
    "name": "internal_id",
    "type": ["null", "string"],
    "default": null
  }]
}

Version:

presto-cli --version
Presto CLI 0.220

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7

github_iconTop GitHub Comments

2reactions
ioah86commented, Aug 23, 2019

@babrar and I have set this one up on all nodes, worker and coordinator. We have actually had the scenario where suddenly a new column appeared which was positioned differently in the parquet compared to the Hive metastore, and there was not an issue.

1reaction
babrarcommented, Aug 21, 2019

There is a Stack Overflow thread for this problem: https://stackoverflow.com/questions/51918860/unable-to-query-parquet-data-with-nested-fields-in-presto-db

Problem: Presto is reading data based on the column indices instead of the column names.

Solution: Add hive.parquet.use-column-names=true to your hive.properties file to force presto to read data using column names instead of column indices.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Error reading from parquet file that is being updated
My suspicion is that the way that the read is happening isn't atomic. Like maybe we are replacing the parquet file while the...
Read more >
Read parquet file error - MATLAB Answers - MathWorks
I'm reading parquet files and facing some problems. For comparison the file was read with python using fastparquet with no errors.
Read more >
Apache Spark job fails with Parquet column cannot be ...
Problem You are reading data in Parquet format and writing to a Delta table when you get a Parquet column cannot be converted...
Read more >
Error reading data from parquet file ( I am pretty new to Pig)
Solved: Pig Stack Trace --------------- ERROR 1200: can't convert optional int96 uploadTime Failed to parse: - 103052.
Read more >
Troubleshooting Reads from ORC and Parquet Files - Vertica
This behavior is specific to Parquet files; with an ORC file the type is correctly reported as STRING. The problem occurs because Parquet...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found