question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Check Parquet schema mismatch 0.225 > 0.226

See original GitHub issue

Hi, I just upgraded our presto version from 0.225 to 0.235 and found some error while querying the same query before. The column duration_from_duration is declared as type bigint, but the Parquet file declares the column as type INT32

I just check the source and release notes that this changes since 0.226 https://github.com/prestodb/presto/commit/5d18a1c01d048bec435e587887bf20ffbe9794f8

https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/parquet/ParquetPageSourceFactory.java#L319

Any suggestion how should i fix this problem? as rewrite the parquet file will be take some times.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
tarigancanacommented, Jun 30, 2020

@mbasmanova let me try that profiling to find the root cause of this problem i just open new issue related to this OOM https://github.com/prestodb/presto/issues/14749 so we can closed this issue 😃

1reaction
vkorukanticommented, Jun 27, 2020

@mbasmanova Actually this issue is fixed by #14548 , but it is available in 0.236. @tarigancana Either you can cherry-pick #14548 or move to 0.236.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Handling Parquet Schema Mismatch Based on Data
The Problem. Consider the following pandas dataframe, written into parquet format: import pandas as pd import ...
Read more >
Release 0.231 — Presto 0.277 Documentation
Fix schema mismatch with Parquet INT64 & Timestamp . Improve string column stats by including the sum of all strings' lengths when the...
Read more >
Schema mismatch when querying parquet files from Athena ...
You have parquet files with two different schema and the Athena table schema matches with the newer one. You can do one of...
Read more >
Troubleshooting Guide - Apache Software Foundation
Writing Data. 1.1 Caused by: org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema mismatch: Avro field 'col1' not found.
Read more >
Pyarrow.concat_tables with ray.data.read_parquet gives ...
Pyarrow.concat_tables with ray.data.read_parquet gives schema mismatch ... I am using to fetch the data from parquet with filters applied:
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found