question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issue with parquet type annotations being ignored

See original GitHub issue

I have a parquet file with the following schema (in hive catalogue):

hadoop jar ~/parquet-tools-1.9.0.jar schema file:///$(pwd)/00000.parquet 
message schema {
  optional int64 FIELDA;
  optional int64 FIELDB;
  optional int32 FIELDC (INT_16);
  optional int32 FIELDD (INT_8);
  optional int64 FIELDF (TIMESTAMP_MILLIS);
  optional int64 FIELDG (TIMESTAMP_MILLIS);
  optional int64 FIELDH;
}

and I placed the following schema on top of it:

CREATE MYTABLE
(
	FIELDA BIGINT,
	FIELDB BIGINT,
	FIELDC SMALLINT,
	FIELDD TINYINT,
	FIELDF TIMESTAMP,
	FIELDG TIMESTAMP,
	FIELDH BIGINT
) with (external_location = 's3a://some_path')

If I query it like so

select fieldc, fieldd, count(*) from mytable group by 1,2

the query runs successfully. But If I try to filter by fieldc or fieldd`, I see the following error:

java.sql.SQLException: Query failed : Error opening Hive split s3a://some_path/00001.parquet : Mismatched Domain types: tinyint vs integer

If I change the external definition to integer, it works fine. Seems to me that the type annotation is being ignored, but, why does it only affect filter operations?

Presto 0.175

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:1
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
drdeecommented, Jul 26, 2017

We are seeing similar issues as described in this in Presto 0.181, Mismatched Domain types: date vs integer in filter statements even though both fields are of type date. Casting the date column as a date solves the problem though.

1reaction
YanamadalaJaiPrakashcommented, Nov 26, 2018

@drdee I came across the same issue , and tried casting both the fields to date and still it did not work. However, on casting both sides to timestamp the query is working. Can anyone please help me with this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Data type errors when loading Parquet data into table using ...
I am trying to move parquet data from an AWS S3 stage into a table in Snowflake and keep getting data type errors....
Read more >
Using the Parquet File Format with Impala Tables
If the option is set to an unrecognized value, all kinds of queries will fail due to the invalid option setting, not just...
Read more >
Using Parquet Data Files | CDP Public Cloud
Parquet uses type annotations to extend the types that it can store, by specifying how the primitive types should be interpreted. Parquet primitive...
Read more >
Parquet - Google Git
NANOSECOND) of Arrow; PARQUET-1303 - Avro reflect @Stringable field write error ... PARQUET-1497 - [Java] javax annotations dependency missing for Java 11 ...
Read more >
Cinchoo ETL - Parquet Writer - CodeProject
Listing 3.1.1 Write List of Objects to Parquet File ... The sample below shows Title member is ignored from Parquet loading process.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found