Issue with parquet type annotations being ignored
See original GitHub issueI have a parquet file with the following schema (in hive catalogue):
hadoop jar ~/parquet-tools-1.9.0.jar schema file:///$(pwd)/00000.parquet
message schema {
optional int64 FIELDA;
optional int64 FIELDB;
optional int32 FIELDC (INT_16);
optional int32 FIELDD (INT_8);
optional int64 FIELDF (TIMESTAMP_MILLIS);
optional int64 FIELDG (TIMESTAMP_MILLIS);
optional int64 FIELDH;
}
and I placed the following schema on top of it:
CREATE MYTABLE
(
FIELDA BIGINT,
FIELDB BIGINT,
FIELDC SMALLINT,
FIELDD TINYINT,
FIELDF TIMESTAMP,
FIELDG TIMESTAMP,
FIELDH BIGINT
) with (external_location = 's3a://some_path')
If I query it like so
select fieldc, fieldd, count(*) from mytable group by 1,2
the query runs successfully. But If I try to filter by fieldc
or fieldd`, I see the following error:
java.sql.SQLException: Query failed : Error opening Hive split s3a://some_path/00001.parquet : Mismatched Domain types: tinyint vs integer
If I change the external definition to integer, it works fine. Seems to me that the type annotation is being ignored, but, why does it only affect filter operations?
Presto 0.175
Issue Analytics
- State:
- Created 6 years ago
- Reactions:1
- Comments:5 (1 by maintainers)
Top Results From Across the Web
Data type errors when loading Parquet data into table using ...
I am trying to move parquet data from an AWS S3 stage into a table in Snowflake and keep getting data type errors....
Read more >Using the Parquet File Format with Impala Tables
If the option is set to an unrecognized value, all kinds of queries will fail due to the invalid option setting, not just...
Read more >Using Parquet Data Files | CDP Public Cloud
Parquet uses type annotations to extend the types that it can store, by specifying how the primitive types should be interpreted. Parquet primitive...
Read more >Parquet - Google Git
NANOSECOND) of Arrow; PARQUET-1303 - Avro reflect @Stringable field write error ... PARQUET-1497 - [Java] javax annotations dependency missing for Java 11 ...
Read more >Cinchoo ETL - Parquet Writer - CodeProject
Listing 3.1.1 Write List of Objects to Parquet File ... The sample below shows Title member is ignored from Parquet loading process.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
We are seeing similar issues as described in this in Presto 0.181,
Mismatched Domain types: date vs integer
in filter statements even though both fields are of typedate
. Casting thedate
column as adate
solves the problem though.@drdee I came across the same issue , and tried casting both the fields to date and still it did not work. However, on casting both sides to timestamp the query is working. Can anyone please help me with this.