Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Spark can't read Iceberg table created from Presto

See original GitHub issue

Spark can’t read table which was created in Presto

create table iceberg.examples.test_table 
with (format = 'parquet')
as select timestamp '2021-01-19 23:59:59.999999' as ts;

In Spark

spark.read
      .format("iceberg")
      .load("examples.test_table")
      .show(10, false)

Fails with exception java.lang.UnsupportedOperationException: Spark does not support timestamp without time zone fields

Please add support of Timestamp without timezone type into Iceberg Spark runtime, because Iceberg supports this type

Issue Analytics

State:
Created 3 years ago
Reactions:4
Comments:5 (2 by maintainers)

Top GitHub Comments

3reactions

davseitsevcommented, Jan 26, 2021

According to Spark docs TimestampType represents values comprising values of fields year, month, day, hour, minute, and second, with the session local time-zone. The timestamp value represents an absolute point in time. So it’s more like LocalDateTime. Why can’t Iceberg expose timestamp without timezone as Spark timestamp?

As far as I understand the problem is that Spark has only one timestamp type but Iceberg has two types which causes some inconsistency in type mapping between Spark and Iceberg.

In my opinion it could be solved in a following way:

For read access there is no problem, both Iceberg types (timestamp, timestamptz) could be exposed as Spark timestamp.
For insert there is no problem because we know target type defined in the table metadata and can choose correct one
For table creation we have inconsistency which could be solved by configuration property. Spark has such kind of property to choose how to store timestamp fields in parquet (spark.sql.parquet.outputTimestampType=(INT96 | TIMESTAMP_MICROS | TIMESTAMP_MILLIS)). Why can’t we have such logic for Iceberg?

0reactions

davseitsevcommented, Apr 16, 2021